带有频段式RNN的高保真语音增强

论文标题

带有频段式RNN的高保真语音增强

High Fidelity Speech Enhancement with Band-split RNN

论文作者

Yu, Jianwei, Luo, Yi, Chen, Hangting, Gu, Rongzhi, Weng, Chao

论文摘要

尽管言语增强（SE）研究取得了迅速的进展，但在具有强烈噪音和干扰扬声器的环境中提高了所需语音的质量仍然具有挑战性。在本文中，我们将最近提出的带状式RNN（BSRNN）模型的应用扩展到全频段SE和个性化SE（PSE）任务。为了减轻不稳定的高频组件在全带语音中的影响，我们分别对低频和高频子带进行双向和单向带级建模。对于PSE任务，我们将扬声器注册模块纳入BSRNN中，以利用目标扬声器信息。此外，我们利用公制歧视器（MGD）和多分辨率频谱判别器（MRSD）来改善感知质量指标。实验结果表明，我们的系统在DNS-2020测试集上实现了最先进的SE系统，在DNS-2023挑战中获得了最先进的结果（SOTA）结果。

Despite the rapid progress in speech enhancement (SE) research, enhancing the quality of desired speech in environments with strong noise and interfering speakers remains challenging. In this paper, we extend the application of the recently proposed band-split RNN (BSRNN) model to full-band SE and personalized SE (PSE) tasks. To mitigate the effects of unstable high-frequency components in full-band speech, we perform bi-directional and uni-directional band-level modeling to low-frequency and high-frequency subbands, respectively. For PSE task, we incorporate a speaker enrollment module into BSRNN to utilize target speaker information. Moreover, we utilize a MetricGAN discriminator (MGD) and a multi-resolution spectrogram discriminator (MRSD) to improve perceptual quality metrics. Experimental results show that our system outperforms various top-ranking SE systems, achieves state-of-the-art (SOTA) results on the DNS-2020 test set and ranks among the top 3 in the DNS-2023 challenge.

下载PDF全文

下载文献需遵守相关版权规定

论文标题