Maximum Entropy Reinforcement Learning with Evolution Strategies

Longxiang Shi, Shijian Li, Zheng Qian, Longbing Cao, Yang Long, Gang Pan

发表年份: 2020
引用次数: 6

摘要

Evolution strategies (ES) have recently raised attention in solving challenging tasks with low computation costs and high scalability. However, it is well-known that evolution strategies reinforcement learning (RL) methods suffer from low stability. Without careful consideration, ES methods are sensitive to local optima and are unstable in learning. Therefore, there is an urgent need for improving the stability of ES methods in solving RL problems. In this paper, we propose a simple yet efficient ES method to stabilize the learning. Specifically, we propose a framework to incorporate the maximum entropy reinforcement learning with evolution strategies and derive an efficient entropy calculation method for linear policies. We further present a practical algorithm called maximum entropy evolution policy search based on the proposed framework, which is efficient and stable for policy search in continuous control. Our algorithm shows high stability across different random seeds and can obtain comparable results in performance against some existing derivative-free RL methods on several of the well-known benchmark MuJoCo robotic control tasks.

关键词

Reinforcement learningComputer scienceScalabilityBenchmark (surveying)Entropy (arrow of time)ComputationMathematical optimizationPrinciple of maximum entropyStability (learning theory)Evolutionary computation

Maximum Entropy Reinforcement Learning with Evolution Strategies

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory