Run-time Observation Interventions Make Vision-Language-Action Models More Visually Robust
Asher J. Hancock, Allen Z. Ren, Anirudha Majumdar
- 发表年份
- 2024
- 访问权限
- 开放获取
摘要
Vision-language-action (VLA) models trained on large-scale internet data and robot demonstrations have the potential to serve as generalist robot policies. However, despite their large-scale training, VLAs are often brittle to task-irrelevant visual details such as distractor objects or background colors. We introduce Bring Your Own VLA (BYOVLA): a run-time intervention scheme that (1) dynamically identifies regions of the input image that the model is sensitive to, and (2) minimally alters task-irrelevant regions to reduce the model's sensitivity using automated image editing tools. Our approach is compatible with any off the shelf VLA without model fine-tuning or access to the model's weights. Hardware experiments on language-instructed manipulation tasks demonstrate that BYOVLA enables state-of-the-art VLA models to nearly retain their nominal performance in the presence of distractor objects and backgrounds, which otherwise degrade task success rates by up to 40%. Website with additional information, videos, and code: https://aasherh.github.io/byovla/ .
关键词
相关论文
面向大型复杂构件的移动机器人辅助磨削技术综述
Yusen Li, Ziwei Wang, Xiangye Zhu 等 12 位作者
Robotics and Computer-Integrated Manufacturing · 2026
基于物理信息与机器学习的五轴铣削TC4钛合金刀具磨损融合预测模型
Shaoqing Qin, Lida Zhu, Yanpeng Hao 等 10 位作者
Robotics and Computer-Integrated Manufacturing · 2026
面向机器人焊接的领域知识引导学习框架:从非结构化工件类型泛化到未见焊缝拓扑
Xianzhong Zhao, Haotian Liu, Zhaoqi Huang 等 4 位作者
Robotics and Computer-Integrated Manufacturing · 2026
一种利用磁致非线性宽带多向被动减振器抑制机器人铣削低频颤振的新方法
Hao Li, Yuhui Yu, Rui Fu 等 6 位作者
Robotics and Computer-Integrated Manufacturing · 2026