TMNet: Transformer-fused multimodal framework for emotion recognition via EEG and speech
Md. Muntasir Ul Alam, Mohamed Abubakar Dini, Dongseon Kim, Taesoo Jun
- Year
- 2025
- Citations
- 12
Abstract
In the evolving field of emotion recognition, which intersects psychology, human–computer interaction, and social robotics, there is a growing demand for more advanced and accurate frameworks. The traditional reliance on single-modal approaches has given way to a focus on multimodal emotion recognition, which offers enhanced performance by integrating multiple data sources. This paper introduces TMNet, an innovative multimodal emotion recognition framework that leverages both speech and Electroencephalography (EEG) signals to deliver superior accuracy. This framework utilizes cutting-edge technology, employing a Transformer model to effectively fuse the CNN-BiLSTM and BiGRU architectures, creating a unified multimodal representation for enhanced emotion recognition performance. By utilizing a diverse set of datasets RAVDESS, SAVEE, TESS, and CREMA-D for speech, along with EEG signals captured via the Muse headband. The multimodal model achieves impressive accuracies of 98.89% for speech and EEG signal processing.
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002