首页 /研究 /Off-Policy Deep Reinforcement Learning Based on Steffensen Value Iteration

LEARNING

Off-Policy Deep Reinforcement Learning Based on Steffensen Value Iteration

Yuhu Cheng, Lin Chen, C. L. Philip Chen, Xuesong Wang

发表年份: 2020
引用次数: 15

摘要

As an important machine learning method, deep reinforcement learning (DRL) has been rapidly developed in recent years and has achieved breakthrough results in many fields, such as video games, natural language processing, and robot control. However, due to the inherit trial-and-error learning mechanism of reinforcement learning and the time-consuming training of deep neural network itself, the convergence speed of DRL is very slow and consequently limits the real applications of DRL. In this article, aiming to improve the convergence speed of DRL, we proposed a novel Steffensen value iteration (SVI) method by applying the Steffensen iteration to the value function iteration of off-policy DRL from the perspective of fixed-point iteration. The proposed SVI is theoretically proved to be convergent and have a faster convergence speed than Bellman value iteration. The proposed SVI has versatility, which can be easily combined with existing off-policy RL algorithms. In this article, we proposed two speedy off-policy DRLs by combining SVI with DDQN and TD3, respectively, namely, SVI-DDQN and SVI-TD3. Experiments on several discrete-action and continuous-action tasks from the Atari 2600 and MuJoCo platforms demonstrated that our proposed SVI-based DRLs can achieve higher average reward in a shorter time than the comparative algorithm.

关键词

Reinforcement learningComputer scienceConvergence (economics)Bellman equationMathematical optimizationArtificial neural networkArtificial intelligenceFixed pointAlgorithmControl theory (sociology)

Off-Policy Deep Reinforcement Learning Based on Steffensen Value Iteration

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory