首页 /研究 /MIST: Multimodal emotion recognition using DeBERTa for text, Semi-CNN for speech, ResNet-50 for facial, and 3D-CNN for motion analysis
HRI

MIST: Multimodal emotion recognition using DeBERTa for text, Semi-CNN for speech, ResNet-50 for facial, and 3D-CNN for motion analysis

Enguerrand Boitel, Alaa Mohasseb, Ella Haig

发表年份
2025
引用次数
32

摘要

Human emotion recognition is a rapidly evolving field in artificial intelligence, crucial for improving human–computer interaction. This paper introduces the MIST (Motion, Image, Speech, and Text) framework, a novel multimodal approach to emotion recognition that integrates diverse data modalities. Unlike existing models focusing on unimodal analysis, MIST leverages the complementary strengths of text (using DeBERTa), speech (using Semi-CNN), facial (using ResNet-50), and motion (using 3D-CNN) data to enhance accuracy and reliability. Our evaluation, conducted on the BAUM-1 and SAVEE datasets, demonstrates that MIST significantly outperforms traditional unimodal and some multimodal approaches in emotion recognition tasks. This research advances the field by providing a better understanding of emotional states, with potential applications in social robots, personal assistants, and educational technologies.

关键词

Computer scienceSpeech recognitionEmotion recognitionResidual neural networkArtificial intelligenceMotion (physics)Pattern recognition (psychology)Deep learning

相关论文

查看 HRI 分类全部论文