Policy Search by Dynamic Programming

J. Andrew Bagnell, Sham M. Kakade, Andrew Y. Ng, Jeff Schneider

Year: 2018
Citations: 133
Access: Open access

Abstract

We consider the policy search approach to reinforcement learning. We show that if a “baseline distribution” is given (indicating roughly how often we expect a good policy to visit each state), then we can derive a policy search algorithm that terminates in a finite number of steps, and for which we can provide non-trivial performance guarantees. We also demonstrate this algorithm on several grid-world POMDPs, a planar biped walking robot, and a double-pole balancing problem.

Keywords

Reinforcement learningComputer scienceDynamic programmingBaseline (sea)GridMathematical optimizationState (computer science)RobotArtificial intelligenceAlgorithm

Policy Search by Dynamic Programming

Abstract

Keywords

Related papers

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory