Home /Research /2.5 A 16nm 5.7TOPS CNN Processor Supporting Bi-Directional FPN for Small-Object Detection on High-Resolution Videos

PERCEPTION

2.5 A 16nm 5.7TOPS CNN Processor Supporting Bi-Directional FPN for Small-Object Detection on High-Resolution Videos

Yu-Chun Ding, Chia-Yu Chang, Chun-Yeh Lin, Hui‐Yun Tsai, Hao-Jiun Tu, Yu‐Ching Su, Tsung‐Han Hsieh, Wen-Ching Chen, Nian-Shyang Chang, Chun‐Pin Lin, Chi‐Shi Chen, Chao-Tsung Huang

Year: 2025
Citations: 1

Abstract

Object detection is vital in intelligent systems like autonomous vehicles, UAVs, VR/AR, and smart robots. Detecting small objects is particularly crucial and can be life-saving for ADAS, as it helps maintain awareness of distant objects to ensure safe following distances. As illustrated in Fig. 2.5.1, distant pedestrians may appear as less than one thousand pixels in a 2M-resolution image, making them hard to detect with low-resolution inputs and shallow networks as supported in prior works. EfficientDet-D3 [6] significantly improves detection precision of small objects by using a high-resolution 896×896 input with its deep 77-layer backbone and advanced multi-layer stacked bidirectional feature pyramid network (Bi-FPN). Its mean average precision of small objects (mAPs) can achieve 28.7% for object areas under 32×32 pixels in the COCO dataset [7]. However, this precision comes with a substantial increase in memory costs on existing accelerators, limiting the wide deployment of accurate small-object detection. Additionally, diverse operations are involved in the deep backbone, introducing varied operational behaviors that reduce hardware efficiency. More computing power is also required for inference with deeper networks (greater numbers of layers) and higher resolution. In this work, we present a memory-efficient and energy-efficient CNN processor to support deep-layer backbone inference with Bi-FPN on high-resolution inputs for high-precision small-object detection. This chip features: 1) a flow-model co-optimized Bi-FPN implementation with orientation-interleaved causally-processed (OICP) modelling to reduce the memory cost for feature maps (FMs); 2) a bandwidth-optimized backbone scheduling with FM re-accessing and re-computing (RARC) to reduce external memory access (EMA); 3) a reconfigurable tensor engine (RTE) to improve compute utilization for diverse operations; and 4) a low-toggle sign-magnitude-two's-complement (SMTC) processing element (PE) design to reduce power consumption for MACs.

Keywords

Computer scienceObject detectionObject (grammar)Computer visionArtificial intelligenceImage resolutionObject basedResolution (logic)High resolutionPattern recognition (psychology)

2.5 A 16nm 5.7TOPS CNN Processor Supporting Bi-Directional FPN for Small-Object Detection on High-Resolution Videos

Abstract

Keywords

Related papers

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory