A 28nm Spiking Vision Transformer Accelerator with Dual-Path Sparse Compute Core and EMA-free Self-Attention Engine for Embodied Intelligence
Chaoming Fang, Ziyang Shen, Tianyang Li, Shiqi Zhao, Fengshi Tian, Jie Yang, Mohamad Sawan
- 发表年份
- 2025
- 引用次数
- 2
摘要
Embodied intelligence (EAI) systems, such as autonomous robots and interactive agents, require real-time and energy-efficient processing of vision data in dynamic environments. Vision transformers have become core models for handling visual information in EAI tasks. However, they face challenges such as large input external memory access (EMA) and inefficient self-attention computation. To address these issues, we propose a hardware and software co-optimized solution. We replace the vanilla vision transformer with a spiking vision transformer, which leverages the high input sparsity of spiking neural networks (SNNs) and the efficient linear self-attention structure in spiking transformers. We further enhance the solution with three key hardware design features: 1) A dual-path sparse compute core that supports sparse processing of both differential and raw inputs in the proposed group-wise frame-differential dataflow, reducing EMA by 58%. 2) A dedicated spiking self-attention engine that not only requires 83.6% less memory space and achieves EMA-free computation, but also reduces self-attention computation by 46.5% using a first-order approximation and a dedicated transpose engine. 3) A unified 1b/8b adder-tree array that accelerates 1b matrix multiplications in spiking self-attention by 4× with only 19% area overhead. Fabricated in 28nm CMOS technology, the chip is validated on multiple EAI tasks. It achieves an energy consumption of 1.79mJ/frame on ViT-B inference, which is 4.1 times lower than previous work with the same model. It also demonstrates an overall energy efficiency of 57.7 TOPS/W, which is 34.8% higher than state-of-the-art (SOTA) vision transformer accelerators.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002