使用复杂的高斯混合模型，深度语音增强中的不确定性估计

论文标题

使用复杂的高斯混合模型，深度语音增强中的不确定性估计

Uncertainty Estimation in Deep Speech Enhancement Using Complex Gaussian Mixture Models

论文作者

Fang, Huajian, Gerkmann, Timo

论文摘要

单渠道深度的语音增强方法通常估算单个乘法掩码，以提取干净的语音，而无需衡量其准确性。取而代之的是，在这项工作中，我们建议量化与基于神经网络的语音增强中的干净语音估计相关的不确定性。预测不确定性通常分为差异不确定性和认知不确定性。前者说明数据的固有不确定性，后者对应于模型不确定性。为了实现强大的清洁语音估计和有效的预测性不确定性量化，我们建议将统计复杂的高斯混合模型（CGMMS）整合到深度的语音增强框架中。更具体地说，我们通过有条件的概率密度对输入和输出之间的依赖性进行建模，并训练神经网络以将噪声输入映射到干净语音的完整后验分布，以模型为多个复杂的高斯组件的混合物。不同数据集的实验结果表明，所提出的算法有效地捕获了预测性不确定性，并且结合强大的统计模型和深度学习也可以提供出色的语音增强性能。

Single-channel deep speech enhancement approaches often estimate a single multiplicative mask to extract clean speech without a measure of its accuracy. Instead, in this work, we propose to quantify the uncertainty associated with clean speech estimates in neural network-based speech enhancement. Predictive uncertainty is typically categorized into aleatoric uncertainty and epistemic uncertainty. The former accounts for the inherent uncertainty in data and the latter corresponds to the model uncertainty. Aiming for robust clean speech estimation and efficient predictive uncertainty quantification, we propose to integrate statistical complex Gaussian mixture models (CGMMs) into a deep speech enhancement framework. More specifically, we model the dependency between input and output stochastically by means of a conditional probability density and train a neural network to map the noisy input to the full posterior distribution of clean speech, modeled as a mixture of multiple complex Gaussian components. Experimental results on different datasets show that the proposed algorithm effectively captures predictive uncertainty and that combining powerful statistical models and deep learning also delivers a superior speech enhancement performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题