How to Efficiently Train Your AI Agent? Characterizing and Evaluating Deep Reinforcement Learning on Heterogeneous Platforms
Yuan Meng, Yang Yang, Sanmukh R. Kuppannagari, Rajgopal Kannan, Viktor K. Prasanna
- 发表年份
- 2020
- 引用次数
- 5
摘要
Deep Reinforcement Learning (Deep RL) is a key technology in several domains such as self-driving cars, robotics, surveillance, etc. In Deep RL, using a Deep Neural Network model, an agent learns how to interact with the environment to achieve a certain goal. The efficiency of running a Deep RL algorithm on a hardware architecture is dependent upon several factors including (1) the suitability of the hardware architecture for kernels and computation patterns which are fundamental to Deep RL; (2) the capability of the hardware architecture's memory hierarchy to minimize data-communication latency; and (3) the ability of the hardware architecture to hide overheads introduced by the deeply nested highly irregular computation characteristics in Deep RL algorithms. GPUs have been popular for accelerating RL algorithms, however, they fail to optimally satisfy the above-mentioned requirements. A few recent works have developed highly customized accelerators for specific Deep RL algorithms. However, they cannot be generalized easily to the plethora of Deep RL algorithms and DNN model choices that are available. In this paper, we explore the possibility of developing a unified framework that can accelerate a wide range of Deep RL algorithms including variations in training methods or DNN model structures. We take one step towards this goal by defining a domain-specific high-level abstraction for a widely used broad class of Deep RL algorithms - on-policy Deep RL. Furthermore, we provide a systematic analysis of the performance of state-of-the-art on-policy Deep RL algorithms on CPU-GPU and CPU-FPGA platforms. We target two representative algorithms - PPO and A2C, for application areas - robotics and games. we show that a FPGA-based custom accelerator achieves up to 24× (PPO) and 8× (A2C) speedups on training tasks, and 17× (PPO) and 2.1 × (A2C) improvements on overall throughput, respectively.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002