CNN inference: VLSI architecture for convolution layer for 1.2 TOPS

Mihir Mody, Manu Mathew, Shyam Jagannathan, A.J. Redfern, Jason Jones, Thorsten Lorenzen

发表年份: 2017
引用次数: 6

摘要

Deep Learning techniques like Convolutional Neural Networks (CNN) are getting popular for image classification with the broad usage spanning across automotive, industrial, medicine, robotics etc. Typical CNN network consists of multiple layers of convolutions, non-linearity, spatial pooling and fully connected layer, with 2D convolutions constituting more than 95% of overall computations. In this paper, we propose novel systolic and fully pipelined architecture for convolution layer which can scale to a high performance at a very low area. The architecture is based on innovative techniques namely vector outer product and intelligent data feeder to enable 3 levels of parallelism namely data values, outputs and inputs along with pipelining of compute elements with data movements. The proposed architecture is scalable to provide processing throughput of 64/256/512/1024 Multiplies and Add (MAC) per cycle. The architecture can run up to clock 600 MHz in low power 28 nm CMOS process node enabling performance of 1.2 Tera-Ops (TOPS).

关键词

Computer scienceConvolution (computer science)Convolutional neural networkParallel computingScalabilityVery-large-scale integrationThroughputArtificial intelligenceNode (physics)Deep learning

CNN inference: VLSI architecture for convolution layer for 1.2 TOPS

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory