Master-Slave Policy Collaboration for Actor-Critic Methods
Xiaomu Li, Quan Liu
- 发表年份
- 2022
- 引用次数
- 3
摘要
Actor-critic methods of deep reinforcement learning are widely used to address continuous control tasks. However, the difficulty in balancing exploration and exploitation, as well as the limitation in the learning efficiency of actors, slow down the overall learning progress, and lead to suboptimal policies. To alleviate these problems, we introduce a policy collaboration mechanism using the master-slave architecture (MSPC). At different stages of training, actors are divided into master actors and slave actors, where the master actors with better performance dominate the training, and the slave actors extract the knowledge of the master actors through policy distillation, thus improving the learning and collaborating efficiency for all actors. Moreover, we propose a new experience replay mechanism, called CER, to further improve the exploration ability and performance of master actors. Finally, we demonstrate empirically the advantages of MSPC by applying it to existing state-of-the-art actor-critic methods in Mujoco robot simulation tasks. We also provide a demonstration showing that MSPC+CER improves over MSPC for sample efficiency and learning speed.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002