Single-microphone speaker separation and voice activity detection in noisy and reverberant environments
Renana Opochinsky, Mordehay Moradi, Sharon Gannot
- Year
- 2025
- Citations
- 4
- Access
- Open access
Abstract
The increasing complexity of real-world environments, where multiple speakers might converse simultaneously, underscores the importance of effective speech separation techniques. This work presents a single-microphone speaker separation network with time-frequency (TF) attention aimed at noisy and reverberant environments. We dub this new architecture Separation TF Attention Network (Sep-TFAnet). Additionally, we introduce a variant of the separation network, Sep-TFAnetVAD, which incorporates a voice activity detector (VAD) into the separation network. The separation module is based on a temporal convolutional network (TCN) backbone inspired by the Conv-Tasnet architecture, with several modifications. Instead of using a learned encoder and decoder, we employ the short-time Fourier transform (STFT) and inverse short-time Fourier transform (iSTFT) for analysis and synthesis, respectively. Our system is specifically developed for human-robot interaction and supports block processing mode. While considerable progress has been made in separating overlapping speech signals, most studies have primarily focused on mixtures of simulated-reverberated speech signals, lacking real-world scenarios. To address this limitation, we introduce the ARImulti-mic dataset, which incorporates real-world experiments. These recordings were carried out in the acoustic laboratory at Bar-Ilan University and captured by a humanoid robot. Throughout this paper, we focus on a single-microphone setting. Extensive evaluation of the proposed methods using this dataset and carefully simulated data demonstrated advantages over competing methods. The ARImulti-mic dataset is available at DataPort, and examples of our algorithm applied to this dataset can be found on the project page: https://Sep-TFAnet.github.io .
Keywords
Related papers
Statistical Learning Theory
Yuhai Wu, Vladimir Vapnik
1999
Artificial intelligence: a modern approach
1995
Applied Nonlinear Control
Jean-Jacques Slotine, Weiping Li
1991
A new optimizer using particle swarm theory
R.C. Eberhart, James Kennedy
2002