UNETGAN：针对极低信噪比条件的时域的强大语音增强方法

论文标题

UNETGAN：针对极低信噪比条件的时域的强大语音增强方法

UNetGAN: A Robust Speech Enhancement Approach in Time Domain for Extremely Low Signal-to-noise Ratio Condition

论文作者

Hao, Xiang, Su, Xiangdong, Wang, Zhiyu, Zhang, Hui, Batushiren

论文摘要

在极低的信噪比（SNR）条件下，语音增强是一个非常具有挑战性的问题，在先前的工作中很少研究。本文提出了一种基于U-NET和生成对抗性学习来解决此问题的强大语音增强方法（UNETGAR）。这种方法由发电机网络和歧视网络组成，该网络直接在时域运行。发电机网络采用U-NET等结构，并在其瓶颈中采用扩张的卷积。我们在公共基准下评估了在低SNR条件（最高-20dB）下的Unetgan的性能。结果表明，它可以显着提高语音质量，并大大优于代表性的深度学习模型，包括Segan，Cgan Fo Se，BiDirectional LSTM，使用相位敏感的频谱近似成本函数（PSA-BLSTM）和Wave Wave-U-NET以及短期客观的客观清晰度（STOI）和语音质量的感知评估（PSA）。

Speech enhancement at extremely low signal-to-noise ratio (SNR) condition is a very challenging problem and rarely investigated in previous works. This paper proposes a robust speech enhancement approach (UNetGAN) based on U-Net and generative adversarial learning to deal with this problem. This approach consists of a generator network and a discriminator network, which operate directly in the time domain. The generator network adopts a U-Net like structure and employs dilated convolution in the bottleneck of it. We evaluate the performance of the UNetGAN at low SNR conditions (up to -20dB) on the public benchmark. The result demonstrates that it significantly improves the speech quality and substantially outperforms the representative deep learning models, including SEGAN, cGAN fo SE, Bidirectional LSTM using phase-sensitive spectrum approximation cost function (PSA-BLSTM) and Wave-U-Net regarding Short-Time Objective Intelligibility (STOI) and Perceptual evaluation of speech quality (PESQ).

下载PDF全文

下载文献需遵守相关版权规定

论文标题