Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model
John Won, Kyungmin Lee, Huiwon Jang, Dongyoung Kim, Jinwoo Shin
- Year
- 2025
- Access
- Open access
Abstract
Augmenting vision-language-action models (VLAs) with world models is promising for robotic policy learning but faces challenges in jointly predicting states and actions due to the modality gap. To address this, we propose DUal-STream diffusion (DUST), a world-model augmented VLA framework featuring a multimodal diffusion transformer that maintains separate modality streams while enabling cross-modal knowledge sharing. In addition, DUST utilizes independent noise perturbations and a decoupled flow matching loss to learn cross-modal causal relationships. We further introduce an asynchronous sampling method for action and vision tokens that enhances performance through inference-time scaling. Experimental results on simulated benchmarks like RoboCasa and GR-1 show that DUST achieves up to 6% gains over state-of-the-art VLA and world-modeling baselines, with inference-time scaling providing an additional 2-5% improvement. In real-world tasks using the Franka Research 3, DUST outperforms baselines by 10% in success rate. Finally, we demonstrate that DUST enables effective transfer learning through both pretraining on action-free videos and joint-training with heterogeneous robot and human datasets.
Keywords
Related papers
Parallel Differentiable Reachability for Learning and Planning with Certified Neural Dynamics and Controllers
Keyi Shen, Glen Chou
2026
Artificial Intelligence enhanced smart welding islands: Foundation models revolutionizing manufacturing
Xiwei Wu, Wei Wu, Qiqi Chen +6 more
Robotics and Computer-Integrated Manufacturing · 2026
A deep reinforcement learning and a dynamic graph neural network-based scheduling agent to control a multi-task robot
Hedi Boukamcha, Anas Neumann, Monia Rekik +3 more
Robotics and Computer-Integrated Manufacturing · 2026
LLM Agent-driven Automated DFA Assessment with Fine-tuning and AAS-based RAG
Jiaxin Liu, Xiaofeng Zhou, Suyang Yu +5 more
Robotics and Computer-Integrated Manufacturing · 2026