Home /Research /Re-architecting the on-chip memory sub-system of machine-learning accelerator for embedded devices
LEARNING

Re-architecting the on-chip memory sub-system of machine-learning accelerator for embedded devices

Ying Wang, Huawei Li, Xiaowei Li

Year
2016
Citations
30

Abstract

The rapid development of deep learning are enabling a plenty of novel applications such as image and speech recognition for embedded systems, robotics or smart wearable devices. However, typical deep learning models like deep convolutional neural networks (CNNs) consume so much on-chip storage and high-throughput compute resources that they cannot be easily handled by mobile or embedded devices with thrifty silicon and power budget. In order to enable large CNN models in mobile or more cutting-edge devices for IoT or cyberphysics applications, we proposed an efficient on-chip memory architecture for CNN inference acceleration, and showed its application to our in-house general-purpose deep learning accelerator. The redesigned on-chip memory subsystem, Memsqueezer, includes an active weight buffer set and data buffer set that embrace specialized compression methods to reduce the footprint of CNN weight and data set respectively. The Memsqueezer buffer can compress the data and weight set according to their distinct features, and it also includes a built-in redundancy detection mechanism that actively scans through the work-set of CNNs to boost their inference performance by eliminating the data redundancy. In our experiment, it is shown that the CNN accelerators with Memsqueezer buffers achieves more than 2× performance improvement and reduces 80% energy consumption on average over the conventional buffer design with the same area budget.

Keywords

Computer scienceConvolutional neural networkDeep learningEmbedded systemRedundancy (engineering)Artificial intelligenceMemory footprintPower budgetEdge devicePower (physics)

Related papers

Browse all LEARNING papers