Se-Melgan-发言人不可知论的快速演讲增强

论文标题

Se-Melgan-发言人不可知论的快速演讲增强

SE-MelGAN -- Speaker Agnostic Rapid Speech Enhancement

论文作者

Chkhetiani, Luka, Bejanidze, Levan

论文摘要

语音合成域[3]，[2]中生成对抗网络的最新进展表明，可以以可靠的方式训练gans [8]，从而从MEL-Spectograms中产生高质量的相干波形。我们建议可以将梅尔根（Melgan）在学习语音特征中的鲁棒性转移到语音增强和减少降噪域而无需任何模型修改任务。我们提出的方法概括了多演讲者的语音数据集，并能够在推断过程中稳健地处理看不见的背景噪声。另外，我们表明，通过增加这种特定方法的批处理大小，不仅会产生更好的语音结果，而且可以轻松地超过多扬声器数据集并导致更快的收敛性。此外，它在两个领域中的语音增强segan [5]的先前状态优于先前的先前状态：1。质量； 2。速度。提议的方法的运行速度比GPU上的实时时间快100倍以上，并且在没有任何硬件优化任务的情况下，在CPU上的实时运行速度超过2倍，就以梅尔根的速度[3]。

Recent advancement in Generative Adversarial Networks in speech synthesis domain[3],[2] have shown, that it's possible to train GANs [8] in a reliable manner for high quality coherent waveform generation from mel-spectograms. We propose that it is possible to transfer the MelGAN's [3] robustness in learning speech features to speech enhancement and noise reduction domain without any model modification tasks. Our proposed method generalizes over multi-speaker speech dataset and is able to robustly handle unseen background noises during the inference. Also, we show that by increasing the batch size for this particular approach not only yields better speech results, but generalizes over multi-speaker dataset easily and leads to faster convergence. Additionally, it outperforms previous state of the art GAN approach for speech enhancement SEGAN [5] in two domains: 1. quality ; 2. speed. Proposed method runs at more than 100x faster than realtime on GPU and more than 2x faster than real time on CPU without any hardware optimization tasks, right at the speed of MelGAN [3].

下载PDF全文

下载文献需遵守相关版权规定

论文标题