Emotion Recognition from Speech to Improve Human-Robot Interaction
Changrui Zhu, Wasim Ahmad
- Year
- 2019
- Citations
- 9
Abstract
Speech emotion recognition (SER) has become one of the significant approaches to improve human-robot interaction. In this paper, two methods are proposed which take into consideration the size of the databases along with other aspects of the models. The first model applied K nearest neighbors (KNN) algorithms with 1-30 Gammatone frequency cepstral coefficients (GTCCs) which is mainly proposed for relatively small databases. It achieved 95.3% overall recognition accuracy on Berlin Emotional Speech database (EMODB). The second model is mainly focused on relatively large databases, which adopted 1-30 GTCCs, delta 1-30 GTCCs, delta-delta 1- 30 GTCCs, spectral features and prosodic features as the feature set and used long short-term memory (LSTM) as the classifier. An overall accuracy of 87.5% is achieved with this model when applied to Chinese emotional speech database (CASIA).
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002