Home /Research /Analyzing the Effect of Stochastic Transitions in Policy Gradients in Deep Reinforcement Learning
LEARNING

Analyzing the Effect of Stochastic Transitions in Policy Gradients in Deep Reinforcement Learning

Ângelo Gregório Lovatto, Thiago Pereira Bueno, Leliane Nunes de Barros

Year
2019
Citations
4

Abstract

Policy gradient methods in deep reinforcement learning have received increasing attention over the last few years, mainly because of their several successful applications to challenging sequential decision-making tasks, such as playing Atari games from pixels and solving simulated robotics tasks. Several publically available benchmarks have enabled the development and evaluation of novel policy gradient methods; however, few incorporate environment stochasticities as a feature. We hypothesize that environments with stochastic dynamics may hinder the performance of these methods and propose an empirical analysis to evaluate so. We implement stochastic variations of a classic cartpole balancing task and select and implement three algorithms: Vanilla Policy Gradient, Natural Policy Gradient, and TRPO. Our results show that convergence in these environments is slower for all three algorithms and that policies obtained in the deterministic setting aren't useful even though the expected effect of each action is the same. We suggest that both further analyses of policy gradient algorithms in stochastic environments and new benchmarks that incorporate these tasks are necessary for better evaluating existing and novel solutions.

Keywords

Reinforcement learningComputer scienceTask (project management)Artificial intelligenceConvergence (economics)Machine learningRoboticsFeature (linguistics)Action (physics)Robot

Related papers

Browse all LEARNING papers