CNN inference: VLSI architecture for convolution layer for 1.2 TOPS
Mihir Mody, Manu Mathew, Shyam Jagannathan, A.J. Redfern, Jason Jones, Thorsten Lorenzen
- Year
- 2017
- Citations
- 6
Abstract
Deep Learning techniques like Convolutional Neural Networks (CNN) are getting popular for image classification with the broad usage spanning across automotive, industrial, medicine, robotics etc. Typical CNN network consists of multiple layers of convolutions, non-linearity, spatial pooling and fully connected layer, with 2D convolutions constituting more than 95% of overall computations. In this paper, we propose novel systolic and fully pipelined architecture for convolution layer which can scale to a high performance at a very low area. The architecture is based on innovative techniques namely vector outer product and intelligent data feeder to enable 3 levels of parallelism namely data values, outputs and inputs along with pipelining of compute elements with data movements. The proposed architecture is scalable to provide processing throughput of 64/256/512/1024 Multiplies and Add (MAC) per cycle. The architecture can run up to clock 600 MHz in low power 28 nm CMOS process node enabling performance of 1.2 Tera-Ops (TOPS).
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002