纳斯塔尔：随着目标传播的重新采样，噪声自适应语音增强

论文标题

纳斯塔尔：随着目标传播的重新采样，噪声自适应语音增强

NASTAR: Noise Adaptive Speech Enhancement with Target-Conditional Resampling

论文作者

Lee, Chi-Chang, Hu, Cheng-Hung, Lin, Yu-Chen, Chen, Chu-Song, Wang, Hsin-Min, Tsao, Yu

论文摘要

对于基于深度学习的语音增强（SE）系统，训练测试的声学不匹配会引起显着的性能降解。为了解决不匹配问题，已经得出了许多噪声适应策略。在本文中，我们提出了一种新颖的方法，称为噪声自适应语音增强（NASTAR），该方法在目标环境中仅与一个样本（一次性）噪声语音的样本（一次性）减少了不匹配。 Nastar使用反馈机制通过噪声提取器和检索模型模拟自适应训练数据。噪声提取器估计了嘈杂语音的目标噪声，称为伪噪声。噪声检索模型根据噪声信号池检索相关的噪声样品，该噪声信号称为相关的噪声。伪噪声和相关的核心集共同采样并与源语音语料库混合，以准备模拟的训练数据以适应噪声。实验结果表明，Nastar可以有效地使用一个嘈杂的语音样本将SE模型适应目标条件。此外，噪声提取器和噪声检索模型都有助于模型适应。据我们所知，纳斯塔尔（Nastar）是第一项通过噪声提取和检索进行单发噪声适应的工作。

For deep learning-based speech enhancement (SE) systems, the training-test acoustic mismatch can cause notable performance degradation. To address the mismatch issue, numerous noise adaptation strategies have been derived. In this paper, we propose a novel method, called noise adaptive speech enhancement with target-conditional resampling (NASTAR), which reduces mismatches with only one sample (one-shot) of noisy speech in the target environment. NASTAR uses a feedback mechanism to simulate adaptive training data via a noise extractor and a retrieval model. The noise extractor estimates the target noise from the noisy speech, called pseudo-noise. The noise retrieval model retrieves relevant noise samples from a pool of noise signals according to the noisy speech, called relevant-cohort. The pseudo-noise and the relevant-cohort set are jointly sampled and mixed with the source speech corpus to prepare simulated training data for noise adaptation. Experimental results show that NASTAR can effectively use one noisy speech sample to adapt an SE model to a target condition. Moreover, both the noise extractor and the noise retrieval model contribute to model adaptation. To our best knowledge, NASTAR is the first work to perform one-shot noise adaptation through noise extraction and retrieval.

下载PDF全文

下载文献需遵守相关版权规定

论文标题