Master-Slave Policy Collaboration for Actor-Critic Methods

Xiaomu Li, Quan Liu

发表年份: 2022
引用次数: 3

摘要

Actor-critic methods of deep reinforcement learning are widely used to address continuous control tasks. However, the difficulty in balancing exploration and exploitation, as well as the limitation in the learning efficiency of actors, slow down the overall learning progress, and lead to suboptimal policies. To alleviate these problems, we introduce a policy collaboration mechanism using the master-slave architecture (MSPC). At different stages of training, actors are divided into master actors and slave actors, where the master actors with better performance dominate the training, and the slave actors extract the knowledge of the master actors through policy distillation, thus improving the learning and collaborating efficiency for all actors. Moreover, we propose a new experience replay mechanism, called CER, to further improve the exploration ability and performance of master actors. Finally, we demonstrate empirically the advantages of MSPC by applying it to existing state-of-the-art actor-critic methods in Mujoco robot simulation tasks. We also provide a demonstration showing that MSPC+CER improves over MSPC for sample efficiency and learning speed.

关键词

Reinforcement learningComputer scienceControl (management)Artificial intelligenceArchitectureState (computer science)Master/slaveSample (material)RobotHuman–computer interaction

Master-Slave Policy Collaboration for Actor-Critic Methods

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory