Home /Research /Endowing Embodied Agents with Spatial Reasoning Capabilities for Vision-and-Language Navigation
PERCEPTION

Endowing Embodied Agents with Spatial Reasoning Capabilities for Vision-and-Language Navigation

Qianqian Bai, Zhongpu Chen, Ling Luo, Huaming Du, Yuqian Lei, Ziyun Jiao

Year
2025
Access
Open access

Abstract

Enhancing the spatial perception capabilities of mobile robots is crucial for achieving embodied Vision-and-Language Navigation (VLN). Although significant progress has been made in simulated environments, directly transferring these capabilities to real-world scenarios often results in severe hallucination phenomena, causing robots to lose effective spatial awareness. To address this issue, we propose BrainNav, a bio-inspired spatial cognitive navigation framework inspired by biological spatial cognition theories and cognitive map theory. BrainNav integrates dual-map (coordinate map and topological map) and dual-orientation (relative orientation and absolute orientation) strategies, enabling real-time navigation through dynamic scene capture and path planning. Its five core modules-Hippocampal Memory Hub, Visual Cortex Perception Engine, Parietal Spatial Constructor, Prefrontal Decision Center, and Cerebellar Motion Execution Unit-mimic biological cognitive functions to reduce spatial hallucinations and enhance adaptability. Validated in a zero-shot real-world lab environment using the Limo Pro robot, BrainNav, compatible with GPT-4, outperforms existing State-of-the-Art (SOTA) Vision-and-Language Navigation in Continuous Environments (VLN-CE) methods without fine-tuning.

Keywords

cs.AIcs.RO

Related papers

Browse all PERCEPTION papers