首页 /研究 /Multimodal perception for enhancing human computer interaction through real-world affect recognition
PERCEPTION

Multimodal perception for enhancing human computer interaction through real-world affect recognition

Karishma Raut, Sujata Kulkarni, Ashwini Sawant

发表年份
2025
引用次数
2
访问权限
开放获取

摘要

Human-Computer Interaction can benefit from real-world affect recognition in applications like healthcare and assistive robotics. Human express emotions through various modalities, with audio-visual being the most significant. Using a unimodal approach, such as only speech or visual, is challenging in natural, dynamic environments. The proposed methodology integrated a pretrained model with a convolution neural network (CNN) to provide a robust initialization point and address the limited availability of facial expression data. The multimodal framework enhances discriminative power by combining visual scores with speech. This work addresses the challenges at each stage of the real-world affect recognition framework, including data preprocessing, feature extraction, feature fusion, and final classification. A 1D-CNN is employed for training on spectral and prosodic audio features, while deep visual features are processed using a 2D-CNN. The proposed system's performance was evaluated on the extended Cohn-Kanade (CK+), acted-facial-expressions in-the-wild (AFEW), and real-world-affective-face-database (RAF) datasets, which are commonly used in face recognition research. Experimental results indicate that 2% to 5% of visual data from natural settings were undetected, and the inclusion of the audio modality improved performance by providing relevant and supplementary information.

关键词

Affect (linguistics)Human–computer interactionPerceptionComputer scienceMultimodal interactionPsychologyCommunicationNeuroscience

相关论文

查看 PERCEPTION 分类全部论文