Robot to Human Interaction with multi-modal conversational engagement
Giuseppe De Simone, Luca Greco, Alessia Saggese, Mario Vento
- Year
- 2025
- Citations
- 2
Abstract
In order to increase the social acceptability of social robots, it is important to foster more natural and effective interactions. To achieve this aim, the robot needs to be equipped with Multi-Engagement capabilities, so as to understand if and when people in the scene want to interact with the robot and to detect specifically who wants to interact. In order to face this challenge, in this paper we propose a multimodal architecture, exploiting both audio and video modalities, combining a deep learning based head pose estimator with an active speaker detector. The engagement is thus modeled through a Behavior Tree, which takes into account the history of the engagements with users and decides accordingly for an appropriate behavior of the robot. The proposed architecture has been integrated into a social robot within a ROS framework, running directly on board of an embedded NVIDIA Jetson device. Furthermore, the experimental results confirm that it is able to achieve an F-Score of 0.82 over the widely adopted UE-HRI dataset. Additionally, the system’s human-robot interaction capabilities were assessed by collecting surveys in real-world settings, which indicated a positive impact on user engagement and interaction quality.
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002