Look, Focus, Act: Efficient and Robust Robot Learning via Human Gaze and Foveated Vision Transformers
Ian Chuang, Jinyu Zou, Andrew Lee, Dechen Gao, Iman Soltani
- Year
- 2025
- Access
- Open access
Abstract
Human vision is a highly active process driven by gaze, which directs attention to task-relevant regions through foveation, dramatically reducing visual processing. In contrast, robot learning systems typically rely on passive, uniform processing of raw camera images. In this work, we explore how incorporating human-like active gaze into robotic policies can enhance efficiency and robustness. We develop GIAVA (Gaze Integrated Active-Vision ALOHA), a robot vision system that emulates human head and neck movement, and gaze adjustment for foveated processing. Extending the AV-ALOHA robot platform, we introduce a framework for simultaneously collecting eye-tracking, perspective control, and robot manipulation demonstration data from a human operator. We also open-source a simulation benchmark and dataset for training robot policies that incorporate human gaze. Inspired by recent work in foveated image segmentation and given the widespread use of Vision Transformers (ViTs) in robot learning, we integrate gaze information into ViTs using a foveated patch tokenization scheme. Compared to uniform patch tokenization, this significantly reduces the number of tokens, and thus computation. Our results show that our method for foveated robot vision drastically reduces computational overhead, and enhances robustness to background distractors. Notably, on certain high-precision tasks, foveated vision also improves performance, as reflected in higher success rates. Together, these findings suggest that human-inspired foveated visual processing offers untapped potential and should be further considered as a useful inductive bias in robotic vision systems. https://ian-chuang.github.io/gaze-av-aloha/
Keywords
Related papers
State-of-the-art in mobile robot-assisted grinding technologies for large-scale complex components
Yusen Li, Ziwei Wang, Xiangye Zhu +9 more
Robotics and Computer-Integrated Manufacturing · 2026
A fusion prediction model of tool wear based on physical information and machine learning in five-axis milling TC4 titanium alloy
Shaoqing Qin, Lida Zhu, Yanpeng Hao +7 more
Robotics and Computer-Integrated Manufacturing · 2026
A novel method of suppressing low-frequency chatter in robotic milling using magnetically-induced nonlinear broadband multidirectional passive vibration absorber
Hao Li, Yuhui Yu, Rui Fu +3 more
Robotics and Computer-Integrated Manufacturing · 2026
Enhancing robotic milling quality via a novel piezoelectric active damping toolholder
Bo Li, Yuanbo Zhao, Huijie Xiao +3 more
Robotics and Computer-Integrated Manufacturing · 2026