Multimodal perception for enhancing human computer interaction through real-world affect recognition
Karishma Raut, Sujata Kulkarni, Ashwini Sawant
- 发表年份
- 2025
- 引用次数
- 2
- 访问权限
- 开放获取
摘要
Human-Computer Interaction can benefit from real-world affect recognition in applications like healthcare and assistive robotics. Human express emotions through various modalities, with audio-visual being the most significant. Using a unimodal approach, such as only speech or visual, is challenging in natural, dynamic environments. The proposed methodology integrated a pretrained model with a convolution neural network (CNN) to provide a robust initialization point and address the limited availability of facial expression data. The multimodal framework enhances discriminative power by combining visual scores with speech. This work addresses the challenges at each stage of the real-world affect recognition framework, including data preprocessing, feature extraction, feature fusion, and final classification. A 1D-CNN is employed for training on spectral and prosodic audio features, while deep visual features are processed using a 2D-CNN. The proposed system's performance was evaluated on the extended Cohn-Kanade (CK+), acted-facial-expressions in-the-wild (AFEW), and real-world-affective-face-database (RAF) datasets, which are commonly used in face recognition research. Experimental results indicate that 2% to 5% of visual data from natural settings were undetected, and the inclusion of the audio modality improved performance by providing relevant and supplementary information.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002