Analyzing the Effect of Stochastic Transitions in Policy Gradients in Deep Reinforcement Learning
Ângelo Gregório Lovatto, Thiago Pereira Bueno, Leliane Nunes de Barros
- 发表年份
- 2019
- 引用次数
- 4
摘要
Policy gradient methods in deep reinforcement learning have received increasing attention over the last few years, mainly because of their several successful applications to challenging sequential decision-making tasks, such as playing Atari games from pixels and solving simulated robotics tasks. Several publically available benchmarks have enabled the development and evaluation of novel policy gradient methods; however, few incorporate environment stochasticities as a feature. We hypothesize that environments with stochastic dynamics may hinder the performance of these methods and propose an empirical analysis to evaluate so. We implement stochastic variations of a classic cartpole balancing task and select and implement three algorithms: Vanilla Policy Gradient, Natural Policy Gradient, and TRPO. Our results show that convergence in these environments is slower for all three algorithms and that policies obtained in the deterministic setting aren't useful even though the expected effect of each action is the same. We suggest that both further analyses of policy gradient algorithms in stochastic environments and new benchmarks that incorporate these tasks are necessary for better evaluating existing and novel solutions.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002