“Look at this!” learning to guide visual saliency in human-robot interaction
Boris Schauerte, Rainer Stiefelhagen
- Year
- 2014
- Citations
- 23
Abstract
We learn to direct visual saliency in multimodal (i.e., pointing gestures and spoken references) human-robot interaction to highlight and segment arbitrary referent objects. For this purpose, we train a conditional random field to integrate features that reflect low-level visual saliency, the likelihood of salient objects, the probability that a given pixel is pointed at, and - if available - spoken information about the target object's visual appearance. As such, this work integrates several of our ideas and approaches, ranging from multi-scale spectral saliency detection, spatially debiased salient object detection, computational attention in human-robot interaction to learning robust color term models. We demonstrate that this machine learning driven integration outperforms the previously reported results on two datasets, one dataset without and one with spoken object references. In summary, for automatically detected pointing gestures and automatically extracted object references, our approach improves the rate at which the correct object is included in the initial focus of attention by 10.37% in the absence and 25.21% in the presence of spoken target object information.
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002