首页 /研究 /Learning Memory-Dependent Continuous Control from Demonstrations
LEARNING

Learning Memory-Dependent Continuous Control from Demonstrations

Siqing Hou, Dongqi Han, Jun Tani

发表年份
2021
访问权限
开放获取

摘要

Efficient exploration has presented a long-standing challenge in reinforcement learning, especially when rewards are sparse. A developmental system can overcome this difficulty by learning from both demonstrations and self-exploration. However, existing methods are not applicable to most real-world robotic controlling problems because they assume that environments follow Markov decision processes (MDP); thus, they do not extend to partially observable environments where historical observations are necessary for decision making. This paper builds on the idea of replaying demonstrations for memory-dependent continuous control, by proposing a novel algorithm, Recurrent Actor-Critic with Demonstration and Experience Replay (READER). Experiments involving several memory-crucial continuous control tasks reveal significantly reduce interactions with the environment using our method with a reasonably small number of demonstration samples. The algorithm also shows better sample efficiency and learning capabilities than a baseline reinforcement learning algorithm for memory-based control from demonstrations.

关键词

cs.LGcs.AI

相关论文

查看 LEARNING 分类全部论文