GuidedVLA: Specifying Task-Relevant Factors via Plug-and-Play Action Attention Specialization
Xiaosong Jia, Bowen Yang, Zuhao Ge, Xian Nie, Yuchen Zhou, Cunxin Fan, Yufeng Li, Yilin Chai, Chao Jing, Zijian Liang, Qingwen Bu, Haidong Cao, Chao Wu, Qifeng Li, Zhenjie Yang, Chenhe Zhang, Hongyang Li, Zuxuan Wu, Junchi Yan, Yu-Gang Jiang
- 发表年份
- 2026
- 访问权限
- 开放获取
摘要
Vision-Language-Action (VLA) models aim for general robot learning by aligning action as a modality within powerful Vision-Language Models (VLMs). Existing VLAs rely on end-to-end supervision to implicitly enable the action decoding process to learn task-relevant features. However, without explicit guidance, these models often overfit to spurious correlations, such as visual shortcuts or environmental noise, limiting their generalization. In this paper, we introduce GuidedVLA, a framework designed to manually guide the action generation to focus on task-relevant factors. Our core insight is to treat the action decoder not as a monolithic learner, but as an assembly of functional components. Individual attention heads are supervised by manually defined auxiliary signals to capture distinct factors. As an initial study, we instantiate this paradigm with three specialized heads: object grounding, spatial geometry, and temporal skill logic. Across simulation and real-robot experiments, GuidedVLA improves success rates in both in-domain and out-of-domain settings compared to strong VLA baselines. Finally, we show that the quality of these specialized factors correlates positively with task performance and that our mechanism yields decoupled, high-quality features. Our results suggest that explicitly guiding action-decoder learning is a promising direction for building more robust and general VLA models.
关键词
相关论文
面向学习与规划的并行可微可达性:具有认证神经动力学与控制器的系统
Keyi Shen, Glen Chou
2026
人工智能增强的智能焊接岛:基础模型革新制造业
Xiwei Wu, Wei Wu, Qiqi Chen 等 9 位作者
Robotics and Computer-Integrated Manufacturing · 2026
基于深度强化学习和动态图神经网络的多任务机器人调度代理
Hedi Boukamcha, Anas Neumann, Monia Rekik 等 6 位作者
Robotics and Computer-Integrated Manufacturing · 2026
基于微调与AAS增强检索的LLM驱动自动化DFA评估
Jiaxin Liu, Xiaofeng Zhou, Suyang Yu 等 8 位作者
Robotics and Computer-Integrated Manufacturing · 2026