首页 /研究 /Social-LLaVA: Enhancing Social Robot Navigation through Human-Language Reasoning
HRI

Social-LLaVA: Enhancing Social Robot Navigation through Human-Language Reasoning

Amirreza Payandeh, Daeun Song, Mohammad Nazeri, Jing Liang, Praneel Mukherjee, Amir Hossain Raj, Yangzhe Kong, Dinesh Manocha, Xuesu Xiao

发表年份
2025
引用次数
3

摘要

As mobile robots become increasingly common in human-centric environments, social navigation—adhering to unwritten social norms rather than merely avoiding pedestrians—has drawn growing attention. Existing methods, from hand-crafted techniques to learning-based approaches, often overlook the nuanced context and scene understanding that humans naturally exhibit. Inspired by studies indicating the critical role of language in cognition and reasoning, we propose a new approach to bridge robot perception and socially aware actions through human-like language reasoning. We introduce Social robot Navigation via Explainable Interactions (SNEI), a human-annotated vision-language dataset comprising over 40K Visual Question Answering (VQA) pairs across 2K unique social scenarios, drawn from diverse, unstructured public spaces. SNEI contains perception, prediction, chain-of-thought reasoning, action, and explanation, thereby allowing robots to interpret social contexts in human language. We fine-tune a Vision-Language Model, Social-LLaVA, on SNEI to demonstrate the potential of language-guided reasoning for high-level navigation tasks. Experimental evaluations—both quantitative and qualitative—demonstrate that Social-LLaVA can outperform state-of-the-art models.<sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">†</sup>.

关键词

Bridge (graph theory)RobotSocial robotPerceptionContext (archaeology)CognitionMobile robotNatural language

相关论文

查看 HRI 分类全部论文