Home /Research /TMNet: Transformer-fused multimodal framework for emotion recognition via EEG and speech
HRI

TMNet: Transformer-fused multimodal framework for emotion recognition via EEG and speech

Md. Muntasir Ul Alam, Mohamed Abubakar Dini, Dongseon Kim, Taesoo Jun

Year
2025
Citations
12

Abstract

In the evolving field of emotion recognition, which intersects psychology, human–computer interaction, and social robotics, there is a growing demand for more advanced and accurate frameworks. The traditional reliance on single-modal approaches has given way to a focus on multimodal emotion recognition, which offers enhanced performance by integrating multiple data sources. This paper introduces TMNet, an innovative multimodal emotion recognition framework that leverages both speech and Electroencephalography (EEG) signals to deliver superior accuracy. This framework utilizes cutting-edge technology, employing a Transformer model to effectively fuse the CNN-BiLSTM and BiGRU architectures, creating a unified multimodal representation for enhanced emotion recognition performance. By utilizing a diverse set of datasets RAVDESS, SAVEE, TESS, and CREMA-D for speech, along with EEG signals captured via the Muse headband. The multimodal model achieves impressive accuracies of 98.89% for speech and EEG signal processing.

Keywords

ElectroencephalographyTransformerSpeech recognitionComputer scienceEmotion recognitionPsychologyEngineeringNeuroscienceElectrical engineeringVoltage

Related papers

Browse all HRI papers