Home /Research /Identification and Localization of One or Two Concurrent Speakers in a Binaural Robotic Context
LEARNING

Identification and Localization of One or Two Concurrent Speakers in a Binaural Robotic Context

Karim Youssef, Katsutoshi Itoyama, Kazuyoshi Yoshii

Year
2015
Citations
9

Abstract

This paper presents a method of identification and azimuth estimation for one or two concurrent speakers in simultaneous utterances. This method is applicable to human-machine interaction and robot audition. Identification and localization have been rarely mutually addressed and related works rely on time-frequency exploitation strategies to extract and treat each source's contribution to the received signal. The presented method relies on a training made with one speaker at a time, but it can exploit a speech segment to identify and localize two speakers. A cochlear filtering-based binaural front-end allows to extract equivalent rectangular bandwidth frequency cepstral coefficients (ERBFCC) and interaural level difference (ILD) features. Artificial neural networks (ANNs) exploit ERBFCCs to provide identity information, and a histogram-based exploitation of ILDs provides azimuth angle information. The method was evaluated in contexts including overlapping segments in the presence of noises and sound reflections and its efficiency was demonstrated. Even with fully overlapping utterances, we reached an 83% identification rate of both speakers, an 82% estimation accuracy of both azimuths and an 68% correct mutual identity and azimuth estimation rate. At least one speaker was correctly identified and localized in more than 99% of the tests for utterances lasting near 5s.

Keywords

Computer scienceBinaural recordingAzimuthSpeech recognitionIdentification (biology)ExploitContext (archaeology)Mel-frequency cepstrumIdentity (music)SIGNAL (programming language)

Related papers

Browse all LEARNING papers