无监督的基于模型的扬声器适应端到端的无晶格MMI模型用于语音识别

论文标题

无监督的基于模型的扬声器适应端到端的无晶格MMI模型用于语音识别

Unsupervised Model-based speaker adaptation of end-to-end lattice-free MMI model for speech recognition

论文作者

Xie, Xurong, Liu, Xunying, Chen, Hui, Wang, Hongan

论文摘要

建模说话者的变异性是自动语音识别（ASR）系统的关键挑战。在本文中，具有紧凑型扬声器（SD）参数的基于学习的隐藏单元贡献（LHUC）的适应技术可用于促进扬声器适应性训练（SAT）和无监督的测试时间扬声器适应端到端（E2E）无晶格MMI（LF-MMI（LF-MMI）模型）。提出了一种基于模型的适应框架，以使用LF-MIMI和横熵（CE）标准估算E2E范式中的SD参数。标准LHUC适应性的各种正则化方法，例如，在E2E LF-MMI CNN-TDNN和CNN-TDNN-BLSTM模型上进行了系统研究的贝叶斯LHUC（BLHUC）适应。基于晶格的置信度得分估计用于适应数据选择，以减少监督标签的不确定性。在300小时的总机任务上进行的实验表明，在提议的无监督E2E适应框架中应用BLHUC到基于字节对编码（BPE）的E2E LF-MMI系统始终超过基线系统，以相对单词误差速率（WER）降低到最高10.5％和14.7％，并在NIST HUB5'00和RTES上降低了14.7％，并将分别为9.0％和9.7％。这些结果可与最新的适应性LF-MMI混合系统和基于构象异构体的E2E系统的结果相媲美。

Modeling the speaker variability is a key challenge for automatic speech recognition (ASR) systems. In this paper, the learning hidden unit contributions (LHUC) based adaptation techniques with compact speaker dependent (SD) parameters are used to facilitate both speaker adaptive training (SAT) and unsupervised test-time speaker adaptation for end-to-end (E2E) lattice-free MMI (LF-MMI) models. An unsupervised model-based adaptation framework is proposed to estimate the SD parameters in E2E paradigm using LF-MMI and cross entropy (CE) criterions. Various regularization methods of the standard LHUC adaptation, e.g., the Bayesian LHUC (BLHUC) adaptation, are systematically investigated to mitigate the risk of overfitting, on E2E LF-MMI CNN-TDNN and CNN-TDNN-BLSTM models. Lattice-based confidence score estimation is used for adaptation data selection to reduce the supervision label uncertainty. Experiments on the 300-hour Switchboard task suggest that applying BLHUC in the proposed unsupervised E2E adaptation framework to byte pair encoding (BPE) based E2E LF-MMI systems consistently outperformed the baseline systems by relative word error rate (WER) reductions up to 10.5% and 14.7% on the NIST Hub5'00 and RT03 evaluation sets, and achieved the best performance in WERs of 9.0% and 9.7%, respectively. These results are comparable to the results of state-of-the-art adapted LF-MMI hybrid systems and adapted Conformer-based E2E systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题