Simulation to Rules: A Dual-VLM Framework for Formal Visual Planning
Yilun Hao, Yongchao Chen, Chuchu Fan, Yang Zhang
- Year
- 2025
- Access
- Open access
Abstract
Vision Language Models (VLMs) show strong potential for visual planning but struggle with precise spatial and long-horizon reasoning, while Planning Domain Definition Language (PDDL) planners excel at formal long-horizon planning but cannot interpret visual inputs. Recent works combine these complementary advantages by translating visual problems into PDDL. However, while VLMs can generate PDDL problem files satisfactorily, accurately generating PDDL domain files, which encode planning rules, remains challenging and typically requires human expertise or environment interaction. We propose VLMFP, a Dual-VLM-guided framework that autonomously generates both PDDL problem and domain files for formal visual planning. VLMFP combines a SimVLM that simulates action consequences with a GenVLM that generates and iteratively refines PDDL files by aligning symbolic execution with simulated outcomes, enabling multiple levels of generalization across unseen instances, visual appearances, and game rules. We evaluate VLMFP on 6 grid-world domains and demonstrate its generalization capability. On average, SimVLM achieves 87.3% and 86.0% scenario understanding and action simulation for seen and unseen appearances, respectively. With the guidance of SimVLM, VLMFP attains 70.0%, 54.1% planning success on unseen instances in seen and unseen appearances, respectively. We further demonstrate that VLMFP scales to complex long-horizon 3D planning tasks, including multi-robot collaboration and assembly scenarios with partial observability and diverse visual variations. Project page: https://sites.google.com/view/vlmfp.
Keywords
Related papers
Dynamic reconfiguration in multi-robot agent systems using embedded language models
Shokhikha Amalana Murdivien, Jongsu Park, Jumyung Um
Robotics and Computer-Integrated Manufacturing · 2026
Hierarchical decision-making for UAVs’ game via LLM enhanced multi-agent reinforcement learning
Xinyu Dong, Bo Li, Guangyu Zhang +2 more
Aerospace Science and Technology · 2026
Formation optimization and obstacle avoidance decision-making methods for cooperative coverage search of multi-UUVs in underwater wreck areas
Haomiao Yu, Zeyuan Zhang, Yantian Ma
Robotics and Autonomous Systems · 2026
Human-in-the-Loop Swarms: A Bionic Swarm Approach to Real-World Soil Mapping
Petras Swissler, Mohammadali Rashidioun, Nicholas Sahu +3 more
2026