论文标题
调查自我监督的学习,以增强语音和分离
Investigating self-supervised learning for speech enhancement and separation
论文作者
论文摘要
语音增强和分离是强大语音处理的两个基本任务。语音增强抑制了背景噪声,而语音分离则提取了干涉扬声器的目标语音。尽管已经提出了大量基于学习的基于学习的增强和分离方法并取得了良好的表现,但对将自我监督学习(SSL)应用于增强和分离的研究仍然有限。在本文中,我们评估了13种SSL上游方法的语音增强和下游任务分离。我们对语音库单的实验结果和libri2mix表明,某些SSL表示始终超过基线特征,包括短时傅立叶变换(STFT)幅度和逻辑MEL FilterBank(Fbank)。此外,我们分析了使现有SSL框架难以应用于语音增强和分离的因素,并讨论了这两个任务所需的表示属性。我们的研究被包括在下游的官方演讲和分离中,以备出色。
Speech enhancement and separation are two fundamental tasks for robust speech processing. Speech enhancement suppresses background noise while speech separation extracts target speech from interfering speakers. Despite a great number of supervised learning-based enhancement and separation methods having been proposed and achieving good performance, studies on applying self-supervised learning (SSL) to enhancement and separation are limited. In this paper, we evaluate 13 SSL upstream methods on speech enhancement and separation downstream tasks. Our experimental results on Voicebank-DEMAND and Libri2Mix show that some SSL representations consistently outperform baseline features including the short-time Fourier transform (STFT) magnitude and log Mel filterbank (FBANK). Furthermore, we analyze the factors that make existing SSL frameworks difficult to apply to speech enhancement and separation and discuss the representation properties desired for both tasks. Our study is included as the official speech enhancement and separation downstreams for SUPERB.