首页 /研究 /Comparing Apples to Oranges: LLM-Powered Multimodal Intention Prediction in an Object Categorization Task

HRI

Comparing Apples to Oranges: LLM-Powered Multimodal Intention Prediction in an Object Categorization Task

Hassan Ali, Philipp Allgeuer, Stefan Wermter

发表年份: 2025
引用次数: 4
访问权限: 开放获取

摘要

Abstract Human intention-based systems enable robots to perceive and interpret user actions to interact with humans and adapt to their behavior proactively. Therefore, intention prediction is pivotal in creating a natural interaction with social robots in human-designed environments. In this paper, we examine using Large Language Models (LLMs) to infer human intention in a collaborative object categorization task with a physical robot. We propose a novel multimodal approach that integrates user non-verbal cues, like hand gestures, body poses, and facial expressions, with environment states and user verbal cues to predict user intentions in a hierarchical architecture. Our evaluation of five LLMs shows the potential for reasoning about verbal and non-verbal user cues, leveraging their context-understanding and real-world knowledge to support intention prediction while collaborating on a task with a social robot. Video: https://youtu.be/tBJHfAuzohI

关键词

Computer scienceCategorizationTask (project management)Object (grammar)Artificial intelligenceNatural language processingInformation retrieval

Comparing Apples to Oranges: LLM-Powered Multimodal Intention Prediction in an Object Categorization Task

摘要

关键词

相关论文

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory