Home /Research /TD3lite: FPGA Acceleration of Reinforcement Learning with Structural and Representation Optimizations
LEARNING

TD3lite: FPGA Acceleration of Reinforcement Learning with Structural and Representation Optimizations

Chan-Wei Hu, Jiang Hu, Sunil P. Khatri

Year
2022
Citations
4

Abstract

Reinforcement learning (RL) is an effective and increasingly popular machine learning approach for optimization and decision-making. However, modern reinforcement learning techniques, such as deep Q-learning, often require neural network inference and training, and therefore are computationally expensive. For example, Twin-Delay Deep Deterministic Policy Gradient (TD3), a state-of-the-art RL technique, uses as many as 6 neural networks. In this work, we study the FPGA-based acceleration of TD3. To address the resource and computational overhead due to inference and training of the multiple neural networks of TD3, we propose TD3lite, an integrated approach consisting of a network sharing technique combined with bitwidth-optimized block floating-point arithmetic. TD3lite is evaluated on several robotic benchmarks with continuous state and action spaces. With only 5.7% learning performance degradation, TD3lite achieves 21 ×and 8 ×speedup compared to CPU and GPU implementations, respectively. Its energy efficiency is 26 ×of the GPU implementation. Moreover, it utilizes ~ 25 - 40% fewer FPGA resources compared to a conventional sinale-precision floating-point representation of TD3.

Keywords

Reinforcement learningComputer scienceArtificial neural networkSpeedupField-programmable gate arrayAccelerationHardware accelerationInferenceArtificial intelligenceParallel computing

Related papers

Browse all LEARNING papers