首页 /研究 /Lightweight Multimodal Emotion Recognition for Companion Robots: A Deep Learning Framework Integrating Facial and Speech Features
HRI

Lightweight Multimodal Emotion Recognition for Companion Robots: A Deep Learning Framework Integrating Facial and Speech Features

Cheng-Kai Lu, Chien-Wei Lu, Guan Bo Lin

发表年份
2025
引用次数
1

摘要

This paper presents a lightweight multimodal deep learning framework for real-time emotion recognition on resource-constrained companion robots, exemplified by Zenbo Junior II. The framework integrates a customized GhostNet with Triplet Attention Modules (TAM) and a Frame Attention Network (FAN) for spatio-temporal facial feature encoding, and employs a depth-optimized one-dimensional convolutional neural network (1D-CNN) for compact speech representation. Decision-level fusion based on the geometric mean enhances robustness to noisy modality predictions. The proposed model comprises 0.92 million parameters and requires 0.77 billion floating-point operations (GFLOPs), achieving 97.56% accuracy on the RAVDESS dataset and 82.33% on CREMA-D. In contrast to existing approaches that optimize accuracy at the expense of computational efficiency, the proposed method demonstrates a balance of accuracy, efficiency, and deployability. These results highlight both the novelty and the feasibility of the framework for real-time emotion monitoring in healthcare and human-robot interaction.

关键词

Deep learningRobustness (evolution)Convolutional neural networkEmotion recognitionFacial expressionNoveltyFeature extractionFeature (linguistics)

相关论文

查看 HRI 分类全部论文