Home /Research /Embedded Streaming Deep Neural Networks Accelerator With Applications

PERCEPTION

Embedded Streaming Deep Neural Networks Accelerator With Applications

Ayşegül Dündar, Jonghoon Jin, Berin Martini, Eugenio Culurciello

Year: 2016
Citations: 122

Abstract

Deep convolutional neural networks (DCNNs) have become a very powerful tool in visual perception. DCNNs have applications in autonomous robots, security systems, mobile phones, and automobiles, where high throughput of the feedforward evaluation phase and power efficiency are important. Because of this increased usage, many field-programmable gate array (FPGA)-based accelerators have been proposed. In this paper, we present an optimized streaming method for DCNNs' hardware accelerator on an embedded platform. The streaming method acts as a compiler, transforming a high-level representation of DCNNs into operation codes to execute applications in a hardware accelerator. The proposed method utilizes maximum computational resources available based on a novel-scheduled routing topology that combines data reuse and data concatenation. It is tested with a hardware accelerator implemented on the Xilinx Kintex-7 XC7K325T FPGA. The system fully explores weight-level and node-level parallelizations of DCNNs and achieves a peak performance of 247 G-ops while consuming less than 4 W of power. We test our system with applications on object classification and object detection in real-world scenarios. Our results indicate high-performance efficiency, outperforming all other presented platforms while running these applications.

Keywords

Computer scienceField-programmable gate arrayConvolutional neural networkCompilerPipeline (software)Deep learningEmbedded systemHardware accelerationThroughputComputer hardware

Embedded Streaming Deep Neural Networks Accelerator With Applications

Abstract

Keywords

Related papers

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory