IntentVLM: Open-Vocabulary Intention Recognition through Forward-Inverse Modeling with Video-Language Models
Hamed Rahimi, Clemence Grislain, Adrien Jacquet Cretides, Olivier Sigaud, Mohamed Chetouani
- Year
- 2026
- Access
- Open access
Abstract
Improving the effectiveness of human-robot interaction requires social robots to accurately infer human goals through robust intention understanding. This challenge is particularly critical in multimodal settings, where agents must integrate heterogeneous signals including text, visual cues to form a coherent interpretation of user intent. This paper presents IntentVLM, a novel two-stage video-language framework designed for open-vocabulary human intention recognition. The approach is inspired by forward-inverse modeling in cognitive science by decomposing intention understanding into goal candidate generation followed by structured inference through selection, effectively reducing hallucinations in latent reasoning. Evaluated on the IntentQA and Inst-IT Bench datasets, IntentVLM achieves state-of-the-art results with up to 80% accuracy, notably surpassing the baseline performance by 30% and matches human performance. Our findings demonstrate that this structured reasoning approach enhances open-vocabulary intention understanding without catastrophic forgetting, offering a robust foundation for human-centered robotics.
Keywords
Related papers
Review and perspectives on multimodal perception, mutual cognition, and embodied execution for human–robot collaboration in Industry 5.0
Kai Ding, Qingyuan Mao, Yaqian Zhang +3 more
Robotics and Computer-Integrated Manufacturing · 2026
Agentic HRC: Achieving context alignment via memory for Human–Robot Collaboration
Jiahui Si, Wenchao Li, Xi Chen +4 more
Robotics and Computer-Integrated Manufacturing · 2026
Towards human-centric manufacturing: Task planning under uncertainties in human–robot collaborative assembly
Yingchao You, Ze Ji, Changyun Wei
Robotics and Computer-Integrated Manufacturing · 2026
Adaptive Physics-informed Transformer with Gaussian process residual compensation for inverse dynamics modeling in Human–Robot Collaboration
Rui Qian, Xi Zhang, Dongpeng Li +2 more
Robotics and Computer-Integrated Manufacturing · 2026