Speech emotion recognition based on a stacked autoencoders optimized by PSO based grass fibrous root optimization
Zeng Chi, Jialing Li, Abbas Habibi
- 发表年份
- 2025
- 引用次数
- 2
- 访问权限
- 开放获取
摘要
Effective speech emotion recognition (SER) poses a significant challenge due to the intricate and subjective nature of human emotions. Recognizing emotional states accurately from speech signals has a broad spectrum of practical applications, such as healthcare, human-computer interaction, and social robotics. This study introduces an innovative approach that merges deep learning with metaheuristic algorithms to boost the efficiency of SER systems. Specifically, a stacked autoencoder (SAE) serves as the primary model, and its performance is fine-tuned using a nature-inspired hybrid algorithm that combines particle swarm optimization (PSO) with Grass Fibrous Root Optimization (GFRO). The proposed model adeptly extracts spectral and pitch features from speech signals, encompassing spectral crest, spectral entropy, spectral flux, and harmonic ratio, to capture emotional cues effectively. The model's performance is evaluated on a standard emotion recognition dataset, comparing with some state-of-the-art models, including Convolutional Neural Network (CNN), Support Vector Machine (SVM), Deep Learning (DL), CNN and Iterative Neighborhood Component Analysis (CNN/INCA), VGG-16 achieving high accuracy in identifying various emotional states.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002