Centralized Model and Exploration Policy for Multi-Agent RL
Qizhen Zhang, Chris Lu, Animesh Garg, Jakob Foerster
- Year
- 2021
- Access
- Open access
Abstract
Reinforcement learning (RL) in partially observable, fully cooperative multi-agent settings (Dec-POMDPs) can in principle be used to address many real-world challenges such as controlling a swarm of rescue robots or a team of quadcopters. However, Dec-POMDPs are significantly harder to solve than single-agent problems, with the former being NEXP-complete and the latter, MDPs, being just P-complete. Hence, current RL algorithms for Dec-POMDPs suffer from poor sample complexity, which greatly reduces their applicability to practical problems where environment interaction is costly. Our key insight is that using just a polynomial number of samples, one can learn a centralized model that generalizes across different policies. We can then optimize the policy within the learned model instead of the true system, without requiring additional environment interactions. We also learn a centralized exploration policy within our model that learns to collect additional data in state-action regions with high model uncertainty. We empirically evaluate the proposed model-based algorithm, MARCO, in three cooperative communication tasks, where it improves sample efficiency by up to 20x. Finally, to investigate the theoretical sample complexity, we adapt an existing model-based method for tabular MDPs to Dec-POMDPs, and prove that it achieves polynomial sample complexity.
Keywords
Related papers
Hierarchical decision-making for UAVs’ game via LLM enhanced multi-agent reinforcement learning
Xinyu Dong, Bo Li, Guangyu Zhang +2 more
Aerospace Science and Technology · 2026
Dynamic reconfiguration in multi-robot agent systems using embedded language models
Shokhikha Amalana Murdivien, Jongsu Park, Jumyung Um
Robotics and Computer-Integrated Manufacturing · 2026
Formation optimization and obstacle avoidance decision-making methods for cooperative coverage search of multi-UUVs in underwater wreck areas
Haomiao Yu, Zeyuan Zhang, Yantian Ma
Robotics and Autonomous Systems · 2026
Human-in-the-Loop Swarms: A Bionic Swarm Approach to Real-World Soil Mapping
Petras Swissler, Mohammadali Rashidioun, Nicholas Sahu +3 more
2026