Home /Research /Self-Supervised Voice Denoising Network for Multi-Scenario Human–Robot Interaction

HRI

Self-Supervised Voice Denoising Network for Multi-Scenario Human–Robot Interaction

Mu Li, Wenjin Xu, Chao Zeng, Ning Wang

Year: 2025
Citations: 1
Access: Open access

Abstract

Human-robot interaction (HRI) via voice command has significantly advanced in recent years, with large Vision-Language-Action (VLA) models demonstrating particular promise in human-robot voice interaction. However, these systems still struggle with environmental noise contamination during voice interaction and lack a specialized denoising network for multi-speaker command isolation in an overlapping speech scenario. To overcome these challenges, we introduce a method to enhance voice command-based HRI in noisy environments, leveraging synthetic data and a self-supervised denoising network to enhance its real-world applicability. Our approach focuses on improving self-supervised network performance in denoising mixed-noise audio through training data scaling. Extensive experiments show our method outperforms existing approaches in simulation and achieves 7.5% higher accuracy than the state-of-the-art method in noisy real-world environments, enhancing voice-guided robot control.

Keywords

Noise reductionNoise (video)Video denoisingSpeech enhancementVoice activity detectionReduction (mathematics)Speech processingRobot

Self-Supervised Voice Denoising Network for Multi-Scenario Human–Robot Interaction

Abstract

Keywords

Related papers

Artificial intelligence: a modern approach

Self-Organizing Maps

Vision meets robotics: The KITTI dataset

Probabilistic robotics