首页 /研究 /Prudent Policy Gradient with Auxiliary Actor in Multi-degree-of-freedom Robotic Tasks
LEARNING

Prudent Policy Gradient with Auxiliary Actor in Multi-degree-of-freedom Robotic Tasks

Xiangjian Li, Huashan Liu, Xin Cheng, Menghua Dong

发表年份
2021
引用次数
4

摘要

The overestimation bias caused by the function approximation error is a common problem of the value-based reinforcement learning algorithms. A clipped Double Q-learning method and delayed policy updates are adopted by Twin Delayed Deep Deterministic policy gradient(TD3) algorithm to reduce the impact of this problem. Although TD3 brings some feasibility, the problem still has not been solved ideally. Thus, based on TD3 an novel algorithm named as Prudent Policy Gradient(PPG) is proposed, where an auxiliary actor is used to prevent actor from selecting exceeding actions and makes the agent’s behavior more prudent. This allows the proposed PPG to find a more efficient and stable policy. The experimental results illustrate that the proposed PPG outperforms TD3 in robotic tasks of several MuJoCo benchmarks and path explorations.

关键词

Reinforcement learningComputer scienceMathematical optimizationControl theory (sociology)AlgorithmArtificial intelligenceMathematicsControl (management)

相关论文

查看 LEARNING 分类全部论文