首页 /研究 /Linear-Quadratic Mean-Field Reinforcement Learning: Convergence of Policy Gradient Methods

LEARNING

Linear-Quadratic Mean-Field Reinforcement Learning: Convergence of Policy Gradient Methods

René Carmona, Mathieu Laurière, Zongjun Tan

发表年份: 2019
访问权限: 开放获取

摘要

We investigate reinforcement learning in the setting of Markov decision processes for a large number of exchangeable agents interacting in a mean field manner. Applications include, for example, the control of a large number of robots communicating through a central unit dispatching the optimal policy computed by maximizing an aggregate reward. An approximate solution is obtained by learning the optimal policy of a generic agent interacting with the statistical distribution of the states and actions of the other agents. We first provide a full analysis this discrete-time mean field control problem. We then rigorously prove the convergence of exact and model-free policy gradient methods in a mean-field linear-quadratic setting and establish bounds on the rates of convergence. We also provide graphical evidence of the convergence based on implementations of our algorithms.

关键词

math.OCcs.LG

Linear-Quadratic Mean-Field Reinforcement Learning: Convergence of Policy Gradient Methods

摘要

关键词

相关论文

The Organization of Behavior

Fractional Brownian Motions, Fractional Noises and Applications

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

A guide to deep learning in healthcare