首页 /研究 /A 28nm Spiking Vision Transformer Accelerator with Dual-Path Sparse Compute Core and EMA-free Self-Attention Engine for Embodied Intelligence
LEARNING

A 28nm Spiking Vision Transformer Accelerator with Dual-Path Sparse Compute Core and EMA-free Self-Attention Engine for Embodied Intelligence

Chaoming Fang, Ziyang Shen, Tianyang Li, Shiqi Zhao, Fengshi Tian, Jie Yang, Mohamad Sawan

发表年份
2025
引用次数
2

摘要

Embodied intelligence (EAI) systems, such as autonomous robots and interactive agents, require real-time and energy-efficient processing of vision data in dynamic environments. Vision transformers have become core models for handling visual information in EAI tasks. However, they face challenges such as large input external memory access (EMA) and inefficient self-attention computation. To address these issues, we propose a hardware and software co-optimized solution. We replace the vanilla vision transformer with a spiking vision transformer, which leverages the high input sparsity of spiking neural networks (SNNs) and the efficient linear self-attention structure in spiking transformers. We further enhance the solution with three key hardware design features: 1) A dual-path sparse compute core that supports sparse processing of both differential and raw inputs in the proposed group-wise frame-differential dataflow, reducing EMA by 58%. 2) A dedicated spiking self-attention engine that not only requires 83.6% less memory space and achieves EMA-free computation, but also reduces self-attention computation by 46.5% using a first-order approximation and a dedicated transpose engine. 3) A unified 1b/8b adder-tree array that accelerates 1b matrix multiplications in spiking self-attention by 4× with only 19% area overhead. Fabricated in 28nm CMOS technology, the chip is validated on multiple EAI tasks. It achieves an energy consumption of 1.79mJ/frame on ViT-B inference, which is 4.1 times lower than previous work with the same model. It also demonstrates an overall energy efficiency of 57.7 TOPS/W, which is 34.8% higher than state-of-the-art (SOTA) vision transformer accelerators.

关键词

TransformerEmbodied cognitionDual (grammatical number)Computer sciencePath (computing)Artificial intelligenceElectrical engineeringEngineeringProgramming languageArt

相关论文

查看 LEARNING 分类全部论文