论文标题
使用复杂值LSTM实现的阶段意识提高语音增强
Phase Aware Speech Enhancement using Realisation of Complex-valued LSTM
论文作者
论文摘要
大多数基于深度学习的语音增强(SE)方法都依赖于观察到的嘈杂语音信号的清洁语音信号的幅度光谱,无论是通过光谱掩盖还是回归。这些方法重复使用嘈杂的阶段,同时从估计的幅度频谱中合成时间域波形。但是,最近的作品强调了SE中阶段的重要性。试图使用复杂的馈电神经网络(FFNN)来估算复杂比率掩盖的阶段。但是FFNN无法捕获相位估计必不可少的顺序信息。在这项工作中,我们提出了使用顺序信息的复杂长期记忆(RCLSTM)网络的实现,以估计复杂比率掩码(CRM)。所提出的RCLSTM旨在使用复杂的算术处理复杂值序列,因此它保留了CRM的真实部分和假想部分之间的依赖关系,从而保持了相位。根据语音库语料库和需求数据库形成的嘈杂的语音混合物评估了所提出的方法。与基于实际价值的掩蔽方法相比,提议的RCLSTM在几种客观措施中改善了它们,包括语音质量的感知评估(PESQ),其中它提高了4.3%以上
Most of the deep learning based speech enhancement (SE) methods rely on estimating the magnitude spectrum of the clean speech signal from the observed noisy speech signal, either by magnitude spectral masking or regression. These methods reuse the noisy phase while synthesizing the time-domain waveform from the estimated magnitude spectrum. However, there have been recent works highlighting the importance of phase in SE. There was an attempt to estimate the complex ratio mask taking phase into account using complex-valued feed-forward neural network (FFNN). But FFNNs cannot capture the sequential information essential for phase estimation. In this work, we propose a realisation of complex-valued long short-term memory (RCLSTM) network to estimate the complex ratio mask (CRM) using sequential information along time. The proposed RCLSTM is designed to process the complex-valued sequences using complex arithmetic, and hence it preserves the dependencies between the real and imaginary parts of CRM and thereby the phase. The proposed method is evaluated on the noisy speech mixtures formed from the Voice-Bank corpus and DEMAND database. When compared to real value based masking methods, the proposed RCLSTM improves over them in several objective measures including perceptual evaluation of speech quality (PESQ), in which it improves by over 4.3%