MOSRA：联合意见分数和房间声学语音质量评估

论文标题

MOSRA：联合意见分数和房间声学语音质量评估

MOSRA: Joint Mean Opinion Score and Room Acoustics Speech Quality Assessment

论文作者

Hajal, Karl El, Cernak, Milos, Mainar, Pablo

论文摘要

声学环境会在交流过程中降低语音质量（例如，视频呼叫，远程演示，外部语音记录），其影响通常未知。鉴于影响语音质量和收集标记数据的难度的因素的多维性，言语质量的客观指标已被证明具有挑战性的挑战。假设声学对语音质量的影响，本文介绍了MOSRA：一种非侵入性的多维语音质量指标，可以预测室内声学参数（SNR，STI，T60，DRR和C50）以及整体平均意见分数（MOS）的语音质量。通过明确优化模型以学习这些房间的声学参数，我们可以提取更有用的功能，并在训练数据受到限制时改善MOS任务的概括。此外，我们还表明，这种联合培训方法增强了房间声学的盲目估计，从而提高了当前最新模型的性能。该联合预测的另一个副作用是预测的解释性提高，这对于许多应用来说是一个有价值的特征。

The acoustic environment can degrade speech quality during communication (e.g., video call, remote presentation, outside voice recording), and its impact is often unknown. Objective metrics for speech quality have proven challenging to develop given the multi-dimensionality of factors that affect speech quality and the difficulty of collecting labeled data. Hypothesizing the impact of acoustics on speech quality, this paper presents MOSRA: a non-intrusive multi-dimensional speech quality metric that can predict room acoustics parameters (SNR, STI, T60, DRR, and C50) alongside the overall mean opinion score (MOS) for speech quality. By explicitly optimizing the model to learn these room acoustics parameters, we can extract more informative features and improve the generalization for the MOS task when the training data is limited. Furthermore, we also show that this joint training method enhances the blind estimation of room acoustics, improving the performance of current state-of-the-art models. An additional side-effect of this joint prediction is the improvement in the explainability of the predictions, which is a valuable feature for many applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题