基于神经网络的实时语音增强的加权语音失真损失

论文标题

基于神经网络的实时语音增强的加权语音失真损失

Weighted Speech Distortion Losses for Neural-network-based Real-time Speech Enhancement

论文作者

Xia, Yangyang, Braun, Sebastian, Reddy, Chandan K. A., Dubey, Harishchandra, Cutler, Ross, Tashev, Ivan

论文摘要

本文研究了培训RNN（经常性神经网络）的几个方面，这些方面影响了实时单渠道语音增强语音增强语音的客观和主观质量。具体而言，我们专注于一个RNN，该RNN以单帧形式的单帧基础增强了短期语音光谱，这是大多数经典信号处理方法所采用的框架。我们提出了两个新颖的基于于点的学习目标，可以单独控制语音失真与减少降噪的重要性。提出的损失功能通过广泛接受的客观质量和清晰度度量进行评估，并将其与其他竞争性在线方法进行比较。此外，我们研究了特征归一化和不同批处理序列长度对增强语音的客观质量的影响。最后，我们显示了建议的方法和最先进的基于RNN的方法的主观评分。

This paper investigates several aspects of training a RNN (recurrent neural network) that impact the objective and subjective quality of enhanced speech for real-time single-channel speech enhancement. Specifically, we focus on a RNN that enhances short-time speech spectra on a single-frame-in, single-frame-out basis, a framework adopted by most classical signal processing methods. We propose two novel mean-squared-error-based learning objectives that enable separate control over the importance of speech distortion versus noise reduction. The proposed loss functions are evaluated by widely accepted objective quality and intelligibility measures and compared to other competitive online methods. In addition, we study the impact of feature normalization and varying batch sequence lengths on the objective quality of enhanced speech. Finally, we show subjective ratings for the proposed approach and a state-of-the-art real-time RNN-based method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题