MA-VLCM: A Vision Language Critic Model for Value Estimation of Policies in Multi-Agent Team Settings
Shahil Shaik, Aditya Parameshwaran, Anshul Nayak, Jonathon M. Smereka, Yue Wang
- Year
- 2026
- Access
- Open access
Abstract
Multi-agent reinforcement learning (MARL) commonly relies on a centralized critic to estimate the value function. However, learning such a critic from scratch is highly sample-inefficient and often lacks generalization across environments. At the same time, large vision-language-action models (VLAs) trained on internet-scale data exhibit strong multimodal reasoning and zero-shot generalization capabilities, yet directly deploying them for robotic execution remains computationally prohibitive, particularly in heterogeneous multi-robot systems with diverse embodiments and resource constraints. To address these challenges, we propose Multi-Agent Vision-Language-Critic Models (MA-VLCM), a framework that replaces the learned centralized critic in MARL with a pretrained vision-language model fine-tuned to evaluate multi-agent behavior. MA-VLCM acts as a centralized critic conditioned on natural language task descriptions, visual trajectory observations, and structured multi-agent state information. By eliminating critic learning during policy optimization, our approach significantly improves sample efficiency while producing compact execution policies suitable for deployment on resource-constrained robots. Results show good zero-shot return estimation on models with differing VLM backbones on in-distribution and out-of-distribution scenarios in multi-agent team settings
Keywords
Related papers
Dynamic reconfiguration in multi-robot agent systems using embedded language models
Shokhikha Amalana Murdivien, Jongsu Park, Jumyung Um
Robotics and Computer-Integrated Manufacturing · 2026
Hierarchical decision-making for UAVs’ game via LLM enhanced multi-agent reinforcement learning
Xinyu Dong, Bo Li, Guangyu Zhang +2 more
Aerospace Science and Technology · 2026
Formation optimization and obstacle avoidance decision-making methods for cooperative coverage search of multi-UUVs in underwater wreck areas
Haomiao Yu, Zeyuan Zhang, Yantian Ma
Robotics and Autonomous Systems · 2026
Human-in-the-Loop Swarms: A Bionic Swarm Approach to Real-World Soil Mapping
Petras Swissler, Mohammadali Rashidioun, Nicholas Sahu +3 more
2026