D3D-VLP: Dynamic 3D Vision-Language-Planning Model for Embodied Grounding and Navigation
Zihan Wang, Seungjun Lee, Guangzhao Dai, Gim Hee Lee
- 发表年份
- 2025
- 访问权限
- 开放获取
摘要
Embodied agents face a critical dilemma that end-to-end models lack interpretability and explicit 3D reasoning, while modular systems ignore cross-component interdependencies and synergies. To bridge this gap, we propose the Dynamic 3D Vision-Language-Planning Model (D3D-VLP). Our model introduces two key innovations: 1) A Dynamic 3D Chain-of-Thought (3D CoT) that unifies planning, grounding, navigation, and question answering within a single 3D-VLM and CoT pipeline; 2) A Synergistic Learning from Fragmented Supervision (SLFS) strategy, which uses a masked autoregressive loss to learn from massive and partially-annotated hybrid data. This allows different CoT components to mutually reinforce and implicitly supervise each other. To this end, we construct a large-scale dataset with 10M hybrid samples from 5K real scans and 20K synthetic scenes that are compatible with online learning methods such as RL and DAgger. Our D3D-VLP achieves state-of-the-art results on multiple benchmarks, including Vision-and-Language Navigation (R2R-CE, REVERIE-CE, NavRAG-CE), Object-goal Navigation (HM3D-OVON), and Task-oriented Sequential Grounding and Navigation (SG3D). Real-world mobile manipulation experiments further validate the effectiveness.
关键词
相关论文
面向大型复杂构件的移动机器人辅助磨削技术综述
Yusen Li, Ziwei Wang, Xiangye Zhu 等 12 位作者
Robotics and Computer-Integrated Manufacturing · 2026
基于物理信息与机器学习的五轴铣削TC4钛合金刀具磨损融合预测模型
Shaoqing Qin, Lida Zhu, Yanpeng Hao 等 10 位作者
Robotics and Computer-Integrated Manufacturing · 2026
通过新型压电主动阻尼刀柄提升机器人铣削质量
Bo Li, Yuanbo Zhao, Huijie Xiao 等 6 位作者
Robotics and Computer-Integrated Manufacturing · 2026
一种利用磁致非线性宽带多向被动减振器抑制机器人铣削低频颤振的新方法
Hao Li, Yuhui Yu, Rui Fu 等 6 位作者
Robotics and Computer-Integrated Manufacturing · 2026