Home /Research /Multi-modal front-end for speaker activity detection in small meetings

PERCEPTION

Multi-modal front-end for speaker activity detection in small meetings

Jani Even, Panikos Heracleous, Carlos Toshinori Ishi, Norihiro Hagita

Year: 2011
Citations: 3

Abstract

Multimodal attention is a key requirement for humanoid robots in order to navigate in complex environments and act as social, cognitive human partners. To this end, robots have to incorporate attention mechanisms that focus the processing on the potentially most relevant stimuli while controlling the sensor orientation to improve the perception of these stimuli. In this paper, we present our implementation of audio-visual saliency-based attention that we integrated in a system for knowledge-driven audio-visual scene analysis and object-based world modeling. For this purpose, we introduce a novel isophote-based method for proto-object segmentation of saliency maps, a surprise-based auditory saliency definition, and a parametric 3-D model for multimodal saliency fusion. The applicability of the proposed system is demonstrated in a series of experiments.

Keywords

Computer scienceSurpriseArtificial intelligenceRobotFocus (optics)SegmentationPerceptionComputer visionAuditory scene analysisOrientation (vector space)

Multi-modal front-end for speaker activity detection in small meetings

Abstract

Keywords

Related papers

Statistical Learning Theory

Artificial intelligence: a modern approach

Applied Nonlinear Control

A new optimizer using particle swarm theory