Probabilistic integration of audiovisual information to localize sound source in human-robot interaction
B. Chen, M. Meguro, M. Kaneko
- 发表年份
- 2004
- 引用次数
- 7
摘要
This paper proposes a method to estimate a sound source position by fusing the auditory and visual information with Bayesian network in human-robot interaction. We firstly integrate multi-channel audio signals and a depth image about the environment to generate a likelihood map for sound source localization. However, this integration, denoted by "MICs", does not always lead to locate a sound source correctly. For correcting the failure in localization, we integrate the likelihood values generated from "MICs" and the skin-color distribution in an image according to the result of classifying audio signal into speech/non-speech categories. The audio classifier is based on the support vector machine(SVM) and the skin-color distribution is modeled with GMM. With the evidences given by MICs, SVMs and GMM, we infer whether pixels in images correspond to sound source or not according to the trained Bayesian network. Finally, experimental results are presented to show the effectiveness of the proposed method.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002