Lightweight Multimodal Emotion Recognition for Companion Robots: A Deep Learning Framework Integrating Facial and Speech Features
Cheng-Kai Lu, Chien-Wei Lu, Guan Bo Lin
- 发表年份
- 2025
- 引用次数
- 1
摘要
This paper presents a lightweight multimodal deep learning framework for real-time emotion recognition on resource-constrained companion robots, exemplified by Zenbo Junior II. The framework integrates a customized GhostNet with Triplet Attention Modules (TAM) and a Frame Attention Network (FAN) for spatio-temporal facial feature encoding, and employs a depth-optimized one-dimensional convolutional neural network (1D-CNN) for compact speech representation. Decision-level fusion based on the geometric mean enhances robustness to noisy modality predictions. The proposed model comprises 0.92 million parameters and requires 0.77 billion floating-point operations (GFLOPs), achieving 97.56% accuracy on the RAVDESS dataset and 82.33% on CREMA-D. In contrast to existing approaches that optimize accuracy at the expense of computational efficiency, the proposed method demonstrates a balance of accuracy, efficiency, and deployability. These results highlight both the novelty and the feasibility of the framework for real-time emotion monitoring in healthcare and human-robot interaction.
关键词
相关论文
The spread of true and false news online
Soroush Vosoughi, Deb Roy, Sinan Aral
2018
Review of deep learning: concepts, CNN architectures, challenges, applications, future directions
Laith Alzubaidi, Jinglan Zhang, Amjad J. Humaidi 等 10 位作者
2021
3D is here: Point Cloud Library (PCL)
Radu Bogdan Rusu, Steve Cousins
2011
A guide to deep learning in healthcare
Andre Esteva, Alexandre Robicquet, Bharath Ramsundar 等 10 位作者
2018