Home /Research /Self-Supervised Voice Denoising Network for Multi-Scenario Human–Robot Interaction
HRI

Self-Supervised Voice Denoising Network for Multi-Scenario Human–Robot Interaction

Mu Li, Wenjin Xu, Chao Zeng, Ning Wang

Year
2025
Citations
1
Access
Open access

Abstract

Human-robot interaction (HRI) via voice command has significantly advanced in recent years, with large Vision-Language-Action (VLA) models demonstrating particular promise in human-robot voice interaction. However, these systems still struggle with environmental noise contamination during voice interaction and lack a specialized denoising network for multi-speaker command isolation in an overlapping speech scenario. To overcome these challenges, we introduce a method to enhance voice command-based HRI in noisy environments, leveraging synthetic data and a self-supervised denoising network to enhance its real-world applicability. Our approach focuses on improving self-supervised network performance in denoising mixed-noise audio through training data scaling. Extensive experiments show our method outperforms existing approaches in simulation and achieves 7.5% higher accuracy than the state-of-the-art method in noisy real-world environments, enhancing voice-guided robot control.

Keywords

Noise reductionNoise (video)Video denoisingSpeech enhancementVoice activity detectionReduction (mathematics)Speech processingRobot

Related papers

Browse all HRI papers