Prudent Policy Gradient with Auxiliary Actor in Multi-degree-of-freedom Robotic Tasks
Xiangjian Li, Huashan Liu, Xin Cheng, Menghua Dong
- 发表年份
- 2021
- 引用次数
- 4
摘要
The overestimation bias caused by the function approximation error is a common problem of the value-based reinforcement learning algorithms. A clipped Double Q-learning method and delayed policy updates are adopted by Twin Delayed Deep Deterministic policy gradient(TD3) algorithm to reduce the impact of this problem. Although TD3 brings some feasibility, the problem still has not been solved ideally. Thus, based on TD3 an novel algorithm named as Prudent Policy Gradient(PPG) is proposed, where an auxiliary actor is used to prevent actor from selecting exceeding actions and makes the agent’s behavior more prudent. This allows the proposed PPG to find a more efficient and stable policy. The experimental results illustrate that the proposed PPG outperforms TD3 in robotic tasks of several MuJoCo benchmarks and path explorations.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Fractional Differential Equations
Igor Podlubný
2025
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991