Home /Research /An efficient implementation of deep convolutional neural networks on a mobile coprocessor

LEARNING

An efficient implementation of deep convolutional neural networks on a mobile coprocessor

Jonghoon Jin, Vinayak Gokhale, Ayşegül Dündar, Bharadwaj Krishnamurthy, Berin Martini, Eugenio Culurciello

Year: 2014
Citations: 43

Abstract

In this paper we present a hardware accelerated real-time implementation of deep convolutional neural networks (DCNNs). DCNNs are becoming popular because of advances in the processing capabilities of general purpose processors. However, DCNNs produce hundreds of intermediate results whose constant memory accesses result in inefficient use of general purpose processor hardware. By using an efficient routing strategy, we are able to maximize utilization of available hardware resources but also obtain high performance in real world applications. Our system, consisting of an ARM Cortex-A9 processor and a coprocessor, is capable of a peak performance of 40 G-ops/s while consuming less than 4W of power. The entire platform is in a small form factor which, combined with its high performance at low power consumption makes it feasible to use this hardware in applications like micro-UAVs, surveillance systems and autonomous robots.

Keywords

CoprocessorComputer scienceConvolutional neural networkEmbedded systemRouting (electronic design automation)Field-programmable gate arrayDeep learningComputer architectureComputer hardwareArtificial intelligence

An efficient implementation of deep convolutional neural networks on a mobile coprocessor

Abstract

Keywords

Related papers

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory