MTI-NET：多目标语音可理解性预测模型

论文标题

MTI-NET：多目标语音可理解性预测模型

MTI-Net: A Multi-Target Speech Intelligibility Prediction Model

论文作者

Zezario, Ryandhimas E., Fu, Szu-wei, Chen, Fei, Fuh, Chiou-Shann, Wang, Hsin-Min, Tsao, Yu

论文摘要

最近，基于深度学习（DL）的非侵入性语音评估模型引起了极大的关注。许多研究报告说，这些基于DL的模型产生令人满意的评估性能和良好的灵活性，但是它们在看不见的环境中的性能仍然是一个挑战。此外，与质量分数相比，更少的研究详细介绍了深度学习模型以估计可理解性得分。这项研究提出了一个多任务语音可理解性预测模型，称为MTI-NET，用于预测人类和机器的可理解性指标。具体而言，鉴于语音话语，MTI-NET旨在预测人类的主观听力测试结果和单词错误率（WER）分数。我们还研究了几种可以改善MTI-NET预测性能的方法。首先，我们比较不同功能（包括自我监督学习（SSL）模型的低级特征和嵌入）和MTI-NET的预测目标。其次，我们探讨了转移学习和多任务学习对MTI-NET的影响。最后，我们研究了微调SSL嵌入的潜在优势。实验结果证明了使用跨域特征，多任务学习和微调SSL嵌入的有效性。此外，已经证实，MTI-NET预测的可理解性和WER得分与地面真相分数高度相关。

Recently, deep learning (DL)-based non-intrusive speech assessment models have attracted great attention. Many studies report that these DL-based models yield satisfactory assessment performance and good flexibility, but their performance in unseen environments remains a challenge. Furthermore, compared to quality scores, fewer studies elaborate deep learning models to estimate intelligibility scores. This study proposes a multi-task speech intelligibility prediction model, called MTI-Net, for simultaneously predicting human and machine intelligibility measures. Specifically, given a speech utterance, MTI-Net is designed to predict human subjective listening test results and word error rate (WER) scores. We also investigate several methods that can improve the prediction performance of MTI-Net. First, we compare different features (including low-level features and embeddings from self-supervised learning (SSL) models) and prediction targets of MTI-Net. Second, we explore the effect of transfer learning and multi-tasking learning on training MTI-Net. Finally, we examine the potential advantages of fine-tuning SSL embeddings. Experimental results demonstrate the effectiveness of using cross-domain features, multi-task learning, and fine-tuning SSL embeddings. Furthermore, it is confirmed that the intelligibility and WER scores predicted by MTI-Net are highly correlated with the ground-truth scores.

下载PDF全文

下载文献需遵守相关版权规定

论文标题