Home /Research /Multi-modal front-end for speaker activity detection in small meetings
PERCEPTION

Multi-modal front-end for speaker activity detection in small meetings

Jani Even, Panikos Heracleous, Carlos Toshinori Ishi, Norihiro Hagita

Year
2011
Citations
3

Abstract

Multimodal attention is a key requirement for humanoid robots in order to navigate in complex environments and act as social, cognitive human partners. To this end, robots have to incorporate attention mechanisms that focus the processing on the potentially most relevant stimuli while controlling the sensor orientation to improve the perception of these stimuli. In this paper, we present our implementation of audio-visual saliency-based attention that we integrated in a system for knowledge-driven audio-visual scene analysis and object-based world modeling. For this purpose, we introduce a novel isophote-based method for proto-object segmentation of saliency maps, a surprise-based auditory saliency definition, and a parametric 3-D model for multimodal saliency fusion. The applicability of the proposed system is demonstrated in a series of experiments.

Keywords

Computer scienceSurpriseArtificial intelligenceRobotFocus (optics)SegmentationPerceptionComputer visionAuditory scene analysisOrientation (vector space)

Related papers

Browse all PERCEPTION papers