首页 /研究 /Prudent Policy Gradient with Auxiliary Actor in Multi-degree-of-freedom Robotic Tasks

LEARNING

Prudent Policy Gradient with Auxiliary Actor in Multi-degree-of-freedom Robotic Tasks

Xiangjian Li, Huashan Liu, Xin Cheng, Menghua Dong

发表年份: 2021
引用次数: 4

摘要

The overestimation bias caused by the function approximation error is a common problem of the value-based reinforcement learning algorithms. A clipped Double Q-learning method and delayed policy updates are adopted by Twin Delayed Deep Deterministic policy gradient(TD3) algorithm to reduce the impact of this problem. Although TD3 brings some feasibility, the problem still has not been solved ideally. Thus, based on TD3 an novel algorithm named as Prudent Policy Gradient(PPG) is proposed, where an auxiliary actor is used to prevent actor from selecting exceeding actions and makes the agent’s behavior more prudent. This allows the proposed PPG to find a more efficient and stable policy. The experimental results illustrate that the proposed PPG outperforms TD3 in robotic tasks of several MuJoCo benchmarks and path explorations.

关键词

Reinforcement learningComputer scienceMathematical optimizationControl theory (sociology)AlgorithmArtificial intelligenceMathematicsControl (management)

Prudent Policy Gradient with Auxiliary Actor in Multi-degree-of-freedom Robotic Tasks

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Fractional Differential Equations

Applied Nonlinear Control