Efficient Sample Reuse in Policy Gradients with Parameter-Based Exploration
Tingting Zhao, Hirotaka Hachiya, Voot Tangkaratt, Jun Morimoto, Masashi Sugiyama
- Year
- 2013
- Citations
- 27
Abstract
The policy gradient approach is a flexible and powerful reinforcement learning method particularly for problems with continuous actions such as robot control. A common challenge is how to reduce the variance of policy gradient estimates for reliable policy updates. In this letter, we combine the following three ideas and give a highly effective policy gradient method: (1) policy gradients with parameter-based exploration, a recently proposed policy search method with low variance of gradient estimates; (2) an importance sampling technique, which allows us to reuse previously gathered data in a consistent way; and (3) an optimal baseline, which minimizes the variance of gradient estimates with their unbiasedness being maintained. For the proposed method, we give a theoretical analysis of the variance of gradient estimates and show its usefulness through extensive experiments.
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002