Multiagent Rollout and Policy Iteration for POMDP with Application to Multi-Robot Repair Problems
Sushmita Bhattacharya, Siva Kailas, Sahil Badyal, Stephanie Gil, Dimitri Bertsekas
- 发表年份
- 2020
- 访问权限
- 开放获取
摘要
In this paper we consider infinite horizon discounted dynamic programming problems with finite state and control spaces, partial state observations, and a multiagent structure. We discuss and compare algorithms that simultaneously or sequentially optimize the agents' controls by using multistep lookahead, truncated rollout with a known base policy, and a terminal cost function approximation. Our methods specifically address the computational challenges of partially observable multiagent problems. In particular: 1) We consider rollout algorithms that dramatically reduce required computation while preserving the key cost improvement property of the standard rollout method. The per-step computational requirements for our methods are on the order of $O(Cm)$ as compared with $O(C^m)$ for standard rollout, where $C$ is the maximum cardinality of the constraint set for the control component of each agent, and $m$ is the number of agents. 2) We show that our methods can be applied to challenging problems with a graph structure, including a class of robot repair problems whereby multiple robots collaboratively inspect and repair a system under partial information. 3) We provide a simulation study that compares our methods with existing methods, and demonstrate that our methods can handle larger and more complex partially observable multiagent problems (state space size $10^{37}$ and control space size $10^{7}$, respectively). Finally, we incorporate our multiagent rollout algorithms as building blocks in an approximate policy iteration scheme, where successive rollout policies are approximated by using neural network classifiers. While this scheme requires a strictly off-line implementation, it works well in our computational experiments and produces additional significant performance improvement over the single online rollout iteration method.
关键词
相关论文
基于嵌入式语言模型的多机器人系统动态重构
Shokhikha Amalana Murdivien, Jongsu Park, Jumyung Um
Robotics and Computer-Integrated Manufacturing · 2026
基于大语言模型增强的多智能体强化学习的无人机博弈分层决策
Xinyu Dong, Bo Li, Guangyu Zhang 等 5 位作者
Aerospace Science and Technology · 2026
水下残骸区域多UUV协同覆盖搜索的编队优化与避碰决策方法
Haomiao Yu, Zeyuan Zhang, Yantian Ma
Robotics and Autonomous Systems · 2026
人在回路中的群体机器人:一种用于真实土壤测绘的仿生群体方法
Petras Swissler, Mohammadali Rashidioun, Nicholas Sahu 等 6 位作者
2026