首页 /研究 /Analyzing the Effect of Stochastic Transitions in Policy Gradients in Deep Reinforcement Learning
LEARNING

Analyzing the Effect of Stochastic Transitions in Policy Gradients in Deep Reinforcement Learning

Ângelo Gregório Lovatto, Thiago Pereira Bueno, Leliane Nunes de Barros

发表年份
2019
引用次数
4

摘要

Policy gradient methods in deep reinforcement learning have received increasing attention over the last few years, mainly because of their several successful applications to challenging sequential decision-making tasks, such as playing Atari games from pixels and solving simulated robotics tasks. Several publically available benchmarks have enabled the development and evaluation of novel policy gradient methods; however, few incorporate environment stochasticities as a feature. We hypothesize that environments with stochastic dynamics may hinder the performance of these methods and propose an empirical analysis to evaluate so. We implement stochastic variations of a classic cartpole balancing task and select and implement three algorithms: Vanilla Policy Gradient, Natural Policy Gradient, and TRPO. Our results show that convergence in these environments is slower for all three algorithms and that policies obtained in the deterministic setting aren't useful even though the expected effect of each action is the same. We suggest that both further analyses of policy gradient algorithms in stochastic environments and new benchmarks that incorporate these tasks are necessary for better evaluating existing and novel solutions.

关键词

Reinforcement learningComputer scienceTask (project management)Artificial intelligenceConvergence (economics)Machine learningRoboticsFeature (linguistics)Action (physics)Robot

相关论文

查看 LEARNING 分类全部论文