首页 /研究 /Optimal Control-Based Baseline for Guided Exploration in Policy Gradient Methods

LEARNING

Optimal Control-Based Baseline for Guided Exploration in Policy Gradient Methods

Xubo Lyu, Site Li, Seth Siriya, Ye Pu, Mo Chen

发表年份: 2020
访问权限: 开放获取

摘要

In this paper, a novel optimal control-based baseline function is presented for the policy gradient method in deep reinforcement learning (RL). The baseline is obtained by computing the value function of an optimal control problem, which is formed to be closely associated with the RL task. In contrast to the traditional baseline aimed at variance reduction of policy gradient estimates, our work utilizes the optimal control value function to introduce a novel aspect to the role of baseline -- providing guided exploration during policy learning. This aspect is less discussed in prior works. We validate our baseline on robot learning tasks, showing its effectiveness in guided exploration, particularly in sparse reward environments.

关键词

cs.LGcs.AIcs.ROeess.SY

Optimal Control-Based Baseline for Guided Exploration in Policy Gradient Methods

摘要

关键词

相关论文

The Organization of Behavior

Fractional Brownian Motions, Fractional Noises and Applications

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

A guide to deep learning in healthcare