Enhancing Face-to-Emotion Recognition with Vision Transformer and Human-in-the-Loop Approach
Mahedi Hasan, Naimul Islam Shuvo, Airin Akter, Fahim Shakil Tamim, Nazir Ahmed, Naeem Mia, Shahinur Alam
- 发表年份
- 2025
- 引用次数
- 1
摘要
Recognizing emotions from facial expressions is essential for applications in areas such as human-computer interaction, mental health, and social robotics. Deep learning approaches have achieved promising performance, but generalization across diverse datasets and real-world conditions is limited. In this paper, we introduced a Vision Transformer(ViT) integrated with a Human-in-the-Loop (HITL) framework that enhances emotion detection accuracy, robustness, and cross-dataset generalizability. The proposed framework incorporates human expertise during the learning and evaluation phase of the model, which allows for a targeted correction of the model output. It also helps to identify mislabeled instances and thus improves the decision boundaries with a small human effort. We conducted experiments on four benchmark datasets: FER2013, RAF-DB, AffectNet-7, and ExpW. Comparative analysis shows that incorporating Human-in-the-Loop with Vision Transformer(ViT) significantly improves the classification accuracy across all datasets, particularly in challenging cross-domain settings. To make this process more efficient, we introduced a confidence-based intervention method, in which only ambiguous predictions are reviewed, reducing the manual effort required. We also implemented incremental model updates that allow the system to continuously improve without retraining from scratch. The weights of the trained model are updated through backpropagation. This combination of human feedback and the Vision Transformer(ViT) makes the model more reliable and adaptable for use in the real world. The Vision Transformer(ViT) integrated with human feedback achieves 7%, 5%, 10%, and 13% more accuracy on FER2013, RAF-DB, AffectNet-7, and ExpW datasets than the baseline Vision Transformer(ViT) model, which outperforms the existing methods. However, our proposed model requires additional time to correct for mispredictions.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002
Are we ready for autonomous driving? The KITTI vision benchmark suite
Andreas Geiger, P Lenz, R. Urtasun
2012
VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection
Yin Zhou, Oncel Tuzel
2018