首页 /研究 /Inf4Edge: Automatic Resource-aware Generation of Energy-efficient CNN Inference Accelerator for Edge Embedded FPGAs
PERCEPTION

Inf4Edge: Automatic Resource-aware Generation of Energy-efficient CNN Inference Accelerator for Edge Embedded FPGAs

Ali Jahanshahi, Rasool Sharifi, Mohammadreza Rezvani, Hadi Zamani

发表年份
2021
引用次数
10

摘要

Convolutional Neural Networks (CNN) have achieved great success in a large number of applications and have been among the most powerful and widely used techniques in computer vision. CNN inference is very computation-intensive which makes it difficult to be integrated into resource-constrained embedded devices such as smart phones, smart glasses, and robots. Along side inference latency, energy-efficiency is also of great importance when it comes to embedded devices with limited computational, storage, and energy resources. Embedded FPGAs, as a fast and energy-efficient solution, are one of widely used platforms for accelerating CNN inference. However, the difficulty of programming and their limited hardware resources have made them a less attractive option to the users. In this paper, we propose Inf4Edge, an automated framework for designing CNN inference accelerator on small embedded FPGAs. The proposed framework seamlessly generates a CNNs inference accelerator that fits the target FPGA using different resource-aware optimization techniques. We eliminate the overhead of transferring the data to/from FPGA back and forth which introduces latency and energy consumption. To avoid the data transfer overhead, we keep all of the data on the FPGA on-chip memory which makes the generated inference accelerator faster and more energy-efficient. Given a high-level description of the CNN and a data set, the framework builds and trains the model, and generates an optimized CNN inference accelerator for the target FPGA. As a case study, we use 16-bit fixed-point data in the generated CNN inference accelerator on a small FPGA and compare it to the same software model running on the FPGA&#x0027;s ARM processor. Using 16-bit fixed-point data type results in &#x007E; 2&#x0025; accuracy loss in the CNN inference accelerator. In return, we get up to <tex>$15.86\times$</tex> speedup performing inference on the FPGA.

关键词

Computer scienceInferenceField-programmable gate arrayConvolutional neural networkEmbedded systemOverhead (engineering)Latency (audio)Efficient energy useEdge deviceHardware acceleration

相关论文

查看 PERCEPTION 分类全部论文