首页 /研究 /Endowing Embodied Agents with Spatial Reasoning Capabilities for Vision-and-Language Navigation
PERCEPTION

Endowing Embodied Agents with Spatial Reasoning Capabilities for Vision-and-Language Navigation

Qianqian Bai, Zhongpu Chen, Ling Luo, Huaming Du, Yuqian Lei, Ziyun Jiao

发表年份
2025
访问权限
开放获取

摘要

Enhancing the spatial perception capabilities of mobile robots is crucial for achieving embodied Vision-and-Language Navigation (VLN). Although significant progress has been made in simulated environments, directly transferring these capabilities to real-world scenarios often results in severe hallucination phenomena, causing robots to lose effective spatial awareness. To address this issue, we propose BrainNav, a bio-inspired spatial cognitive navigation framework inspired by biological spatial cognition theories and cognitive map theory. BrainNav integrates dual-map (coordinate map and topological map) and dual-orientation (relative orientation and absolute orientation) strategies, enabling real-time navigation through dynamic scene capture and path planning. Its five core modules-Hippocampal Memory Hub, Visual Cortex Perception Engine, Parietal Spatial Constructor, Prefrontal Decision Center, and Cerebellar Motion Execution Unit-mimic biological cognitive functions to reduce spatial hallucinations and enhance adaptability. Validated in a zero-shot real-world lab environment using the Limo Pro robot, BrainNav, compatible with GPT-4, outperforms existing State-of-the-Art (SOTA) Vision-and-Language Navigation in Continuous Environments (VLN-CE) methods without fine-tuning.

关键词

cs.AIcs.RO

相关论文

查看 PERCEPTION 分类全部论文