Work in Progress: Real-time Transformer Inference on Edge AI Accelerators
Brendan Reidy, Mohammadreza Mohammadi, Mohammed Elbtity, Heath Smith, Z Ramtin
- 发表年份
- 2023
- 引用次数
- 12
摘要
Transformer models have become a dominant architecture in the world of machine learning. From natural language processing to more recent computer vision applications, Transformers have shown remarkable results and established a new state-of-the-art in many domains. However, this increase in performance has come at the cost of ever-increasing model sizes requiring more resources to deploy. Machine learning (ML) models are used in many real-world systems, such as robotics, mobile devices, and internet of things (IoT) devices, that require fast inference with low energy consumption. For batterypowered devices, lower energy consumption directly translates into longer battery life. To address these issues, several edge AI accelerators have been developed. Among these, the Coral Edge TPU has shown promising results for image classification while maintaining very low energy consumption. Many of these devices, including the Coral TPU, were originally designed to accelerate convolutional neural networks, making deployment of Transformers challenging. Here, we propose a methodology to deploy Transformers on Edge TPU. We provide extensive latency, power, and energy comparisons among the leading edge devices and show that our methodology allows for real-time inference of Transformers while maintaining the lowest power and energy consumption of other edge devices on the market.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002