首页 /研究 /Work in Progress: Real-time Transformer Inference on Edge AI Accelerators

PERCEPTION

Work in Progress: Real-time Transformer Inference on Edge AI Accelerators

Brendan Reidy, Mohammadreza Mohammadi, Mohammed Elbtity, Heath Smith, Z Ramtin

发表年份: 2023
引用次数: 12

摘要

Transformer models have become a dominant architecture in the world of machine learning. From natural language processing to more recent computer vision applications, Transformers have shown remarkable results and established a new state-of-the-art in many domains. However, this increase in performance has come at the cost of ever-increasing model sizes requiring more resources to deploy. Machine learning (ML) models are used in many real-world systems, such as robotics, mobile devices, and internet of things (IoT) devices, that require fast inference with low energy consumption. For batterypowered devices, lower energy consumption directly translates into longer battery life. To address these issues, several edge AI accelerators have been developed. Among these, the Coral Edge TPU has shown promising results for image classification while maintaining very low energy consumption. Many of these devices, including the Coral TPU, were originally designed to accelerate convolutional neural networks, making deployment of Transformers challenging. Here, we propose a methodology to deploy Transformers on Edge TPU. We provide extensive latency, power, and energy comparisons among the leading edge devices and show that our methodology allows for real-time inference of Transformers while maintaining the lowest power and energy consumption of other edge devices on the market.

关键词

Computer scienceEdge deviceTransformerInferenceArtificial intelligenceEnergy consumptionConvolutional neural networkEdge computingDeep learningEmbedded system

Work in Progress: Real-time Transformer Inference on Edge AI Accelerators

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory