EchoVLA: Synergistic Declarative Memory for VLA-Driven Mobile Manipulation
Min Lin, Xiwen Liang, Bingqian Lin, Liu Jingzhi, Zijian Jiao, Kehan Li, Yu Sun, Weijia Liufu, Yuhan Ma, Yuecheng Liu, Shen Zhao, Yuzheng Zhuang, Xiaodan Liang
- Year
- 2025
- Access
- Open access
Abstract
Recent progress in Vision-Language-Action (VLA) models has enabled embodied agents to interpret multimodal instructions and perform complex tasks. However, existing VLAs are mostly confined to short-horizon, table-top manipulation, lacking the memory and reasoning capability required for mobile manipulation, where agents must coordinate navigation and manipulation under changing spatial contexts. In this work, we present EchoVLA, a memory-aware VLA model for mobile manipulation. EchoVLA incorporates a synergistic declarative memory inspired by the human brain, consisting of a scene memory that maintains a collection of spatial-semantic maps and an episodic memory that stores task-level experiences with multimodal contextual features. The two memories are individually stored, updated, and retrieved based on current observations, task history, and instructions, and their retrieved representations are fused via coarse- and fine-grained attention to guide base-arm diffusion policies. To support large-scale training, we further introduce MoMani, an automated benchmark that generates expert-level trajectories through multimodal large language model (MLLM)-guided planning and feedback-driven refinement, supplemented with real-robot demonstrations. Comprehensive simulated and real-world results demonstrate that EchoVLA substantially improves overall performance, e.g., it achieves the highest success rates of 0.52 on manipulation/navigation tasks and 0.31 on mobile manipulation tasks in simulation, exceeding the strong baseline $π_{0.5}$ by +0.20 and +0.11, respectively.
Keywords
Related papers
State-of-the-art in mobile robot-assisted grinding technologies for large-scale complex components
Yusen Li, Ziwei Wang, Xiangye Zhu +9 more
Robotics and Computer-Integrated Manufacturing · 2026
A fusion prediction model of tool wear based on physical information and machine learning in five-axis milling TC4 titanium alloy
Shaoqing Qin, Lida Zhu, Yanpeng Hao +7 more
Robotics and Computer-Integrated Manufacturing · 2026
Enhancing robotic milling quality via a novel piezoelectric active damping toolholder
Bo Li, Yuanbo Zhao, Huijie Xiao +3 more
Robotics and Computer-Integrated Manufacturing · 2026
A novel method of suppressing low-frequency chatter in robotic milling using magnetically-induced nonlinear broadband multidirectional passive vibration absorber
Hao Li, Yuhui Yu, Rui Fu +3 more
Robotics and Computer-Integrated Manufacturing · 2026