Facial Expression Recognition With an Efficient Mix Transformer for Affective Human-Robot Interaction
Farshad Safavi, Kulin Patel, Ramana Vinjamuri
- 发表年份
- 2025
- 引用次数
- 6
摘要
Emotion recognition can significantly enhance interactions between humans and robots, particularly in shared tasks and collaborative processes. Facial Expression Recognition (FER) allows affective robots to adapt their behavior in a socially appropriate manner. However, the potential of efficient Transformers for FER remains underexplored. Additionally, leveraging self-attention mechanisms to create segmentation masks that accentuate facial landmarks for improved accuracy has not been fully investigated. Furthermore, current FER methods lack computational efficiency and scalability, limiting their applicability in real-time scenarios. Therefore, we developed the robust, scalable, and generalizable EmoFormer model, incorporating an efficient Mix Transformer block along with a novel fusion block. Our approach scales across a range of models from EmoFormer-B0 to EmoFormer-B2. The main innovation lies in the fusion block, which uses element-wise multiplication of facial landmarks to emphasize their role in the feature map. This integration of local and global attention creates powerful representations. The efficient self-attention mechanism within the Mix Transformer establishes connections among various facial regions. It enhances efficiency while maintaining accuracy in emotion classification from facial landmarks. We evaluated our approach for both categorical and dimensional facial expression recognition on four datasets: FER2013, AffectNet-7, AffectNet-8, and DEAP. Our ensemble method achieved state-of-the-art results, with accuracies of 77.35% on FER2013, 67.71% on AffectNet-7, and 65.14% on AffectNet-8. For the DEAP dataset, our method achieved 98.07% accuracy for arousal and 97.86% for valence, demonstrating the robustness and generalizability of our models. As an application of our method, we implemented EmoFormer in an affective robotic arm, enabling the human-robot interaction system to adjust its speed based on the user's facial expressions. This was validated through a user experiment with six subjects, demonstrating the feasibility and effectiveness of our approach in creating emotionally intelligent human-robot interactions. Overall, our results demonstrate that EmoFormer is a robust, efficient, and scalable solution for FER, with significant potential for advancing human-robot interaction through emotion-aware robotics.
关键词
相关论文
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002