使用延迟的子带LSTM的在线单声音演讲增强

论文标题

使用延迟的子带LSTM的在线单声音演讲增强

Online Monaural Speech Enhancement Using Delayed Subband LSTM

论文作者

Li, Xiaofei, Horaud, Radu

论文摘要

本文提出了一个延迟的子带LSTM网络，用于在线单声音（单通道）语音增强。提出的方法是在短时间傅立叶变换（STFT）域中开发的。在线处理需要逐帧信号接收和处理。提出方法的最重要特征是在频率上使用相同的LSTM，这大大减少了网络参数的数量，训练数据的量和计算负担。训练是以子带的方式进行的：输入由一个频率组成，以及一些上下文频率。该网络学习了依靠信号平稳性和局部光谱模式的语音到噪声判别功能，该功能基于每个频率以其预测清洁的语音掩码。为了利用未来的信息，即浏览，我们提出了一个删除输出的子带体系结构，该体系结构允许单向前向网络除了当前帧外还可以处理一些未来的帧。我们利用提出的方法参与DNS实时演讲增强挑战。使用DNS数据集进行的实验表明，所提出的方法比DNS基线方法获得了更好的性能测量分数，DNS基线方法使用封闭式复发单元网络学习了全频段光谱。

This paper proposes a delayed subband LSTM network for online monaural (single-channel) speech enhancement. The proposed method is developed in the short time Fourier transform (STFT) domain. Online processing requires frame-by-frame signal reception and processing. A paramount feature of the proposed method is that the same LSTM is used across frequencies, which drastically reduces the number of network parameters, the amount of training data and the computational burden. Training is performed in a subband manner: the input consists of one frequency, together with a few context frequencies. The network learns a speech-to-noise discriminative function relying on the signal stationarity and on the local spectral pattern, based on which it predicts a clean-speech mask at each frequency. To exploit future information, i.e. look-ahead, we propose an output-delayed subband architecture, which allows the unidirectional forward network to process a few future frames in addition to the current frame. We leverage the proposed method to participate to the DNS real-time speech enhancement challenge. Experiments with the DNS dataset show that the proposed method achieves better performance-measuring scores than the DNS baseline method, which learns the full-band spectra using a gated recurrent unit network.

下载PDF全文

下载文献需遵守相关版权规定

论文标题