Home /Research /Pitch extraction in Human-Robot interaction
HRI

Pitch extraction in Human-Robot interaction

Martin Heckmann, Frank Joublin, Kazuhiro Nakadai

Year
2010
Citations
3

Abstract

We present a system for real-time fundamental frequency, i. e. pitch, extraction on a humanoid robot. The system extracts pitch using an 8 channel microphone array mounted on the Honda humanoid robot in a realistic Human-Robot interaction scenario. The main building blocks of the system are a multi-channel signal enhancement followed by robust pitch extraction and tracking. The signal enhancement is based on 8 channel Geometric Source Separation. For the pitch extraction the signal is first transformed with a Gammatone filter bank into the frequency domain. Next a histogram of zero crossing distances is calculated from all filter bank signals. During the calculation of the histogram spurious side peaks resulting from harmonics and sub-harmonics of the true fundamental frequency are inhibited. The resulting histogram then serves as input to a grid based Bayesian tracker which deploys Bayesian filtering in a forward step and Bayesian smoothing in a backward step on a 100ms time window. We demonstrate the performance of the system in a scenario where male and female speakers utter different phrases while standing at a normal interaction distance to the robot. For the evaluation we compare the pitch tracking results once obtained from a clean headset signal and once from the signals obtained from the robot. The results show that the tracking performance only degrades to a small extent in the realistic interaction scenario compared to the headset recordings.

Keywords

Computer scienceHistogramSIGNAL (programming language)Computer visionRobotFrequency domainArtificial intelligenceMicrophone arraySmoothingMicrophone

Related papers

Browse all HRI papers