FFC-SE：快速傅立叶卷积以增强语音

论文标题

FFC-SE：快速傅立叶卷积以增强语音

FFC-SE: Fast Fourier Convolution for Speech Enhancement

论文作者

Shchekotov, Ivan, Andreev, Pavel, Ivanov, Oleg, Alanov, Aibek, Vetrov, Dmitry

论文摘要

Fast Fourier卷积（FFC）是最近提出的神经操作员，在几个计算机视觉问题中表现出令人鼓舞的性能。 FFC操作员允许在神经网络的早期层中采用大型接受现场操作。事实证明，它特别有助于对音频处理中常见的周期性结构介绍。在这项工作中，我们设计了神经网络体系结构，以使FFC适应语音增强。我们假设一个大型的接受场使这些网络比香草卷积模型产生更多的相干相，并通过实验验证该假设。我们发现，基于快速傅立叶卷积的神经网络比其他类似的卷积模型优于类似模型，并与其他语音增强基线显示出更好或可比的结果。

Fast Fourier convolution (FFC) is the recently proposed neural operator showing promising performance in several computer vision problems. The FFC operator allows employing large receptive field operations within early layers of the neural network. It was shown to be especially helpful for inpainting of periodic structures which are common in audio processing. In this work, we design neural network architectures which adapt FFC for speech enhancement. We hypothesize that a large receptive field allows these networks to produce more coherent phases than vanilla convolutional models, and validate this hypothesis experimentally. We found that neural networks based on Fast Fourier convolution outperform analogous convolutional models and show better or comparable results with other speech enhancement baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题