首页 /研究 /Actor-critic versus direct policy search: a comparison based on sample complexity

LEARNING

Actor-critic versus direct policy search: a comparison based on sample complexity

Arnaud de Froissard de Broissia, Olivier Sigaud

发表年份: 2016
引用次数: 9
访问权限: 开放获取

摘要

Sample efficiency is a critical property when optimizing policy parameters for the controller of a robot. In this paper, we evaluate two state-of-the-art policy optimization algorithms. One is a recent deep reinforcement learning method based on an actor-critic algorithm, Deep Deterministic Policy Gradient (DDPG), that has been shown to perform well on various control benchmarks. The other one is a direct policy search method, Covariance Matrix Adaptation Evolution Strategy (CMA-ES), a black-box optimization method that is widely used for robot learning. The algorithms are evaluated on a continuous version of the mountain car benchmark problem, so as to compare their sample complexity. From a preliminary analysis, we expect DDPG to be more sample efficient than CMA-ES, which is confirmed by our experimental results.

关键词

Sample (material)Computer sciencePhysics

Actor-critic versus direct policy search: a comparison based on sample complexity

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory