通过自我判断来提高跨域语音识别

论文标题

通过自我判断来提高跨域语音识别

Boosting Cross-Domain Speech Recognition with Self-Supervision

论文作者

Zhu, Han, Cheng, Gaofeng, Wang, Jindong, Hou, Wenxin, Zhang, Pengyuan, Yan, Yonghong

论文摘要

由于训练和测试分布之间的不匹配，自动语音识别（ASR）的跨域性能可能会受到严重阻碍。由于目标域通常缺乏标记的数据，并且在声学和语言水平上存在域移位，因此对ASR进行无监督的域适应性（UDA）是一项挑战。先前的工作表明，通过利用未标记的数据的自我检查，自我监督的学习（SSL）或伪标记（PL）在UDA中有效。但是，这些自我vissions在不匹配的域分布中也面临着绩效退化，这是以前的工作未能解决的。这项工作提出了一个系统的UDA框架，可以在预训练和微调范式中充分利用具有自学意义的未标记数据。一方面，我们应用持续的训练和数据重播技术来减轻SSL预训练模型的域不匹配。另一方面，我们基于PL技术提出了一种域自适应微调方法，并具有三种独特的修改：首先，我们设计了一种双分支PL方法，以降低对错误的伪标签的敏感性；其次，我们设计了一种不确定性感知的置信度过滤策略，以提高伪标签的正确性。第三，我们引入了两步PL方法，以结合目标域语言知识，从而产生更准确的目标域伪标记。各种跨域场景的实验结果表明，所提出的方法有效地提高了跨域的性能，并显着优于先前的方法。

The cross-domain performance of automatic speech recognition (ASR) could be severely hampered due to the mismatch between training and testing distributions. Since the target domain usually lacks labeled data, and domain shifts exist at acoustic and linguistic levels, it is challenging to perform unsupervised domain adaptation (UDA) for ASR. Previous work has shown that self-supervised learning (SSL) or pseudo-labeling (PL) is effective in UDA by exploiting the self-supervisions of unlabeled data. However, these self-supervisions also face performance degradation in mismatched domain distributions, which previous work fails to address. This work presents a systematic UDA framework to fully utilize the unlabeled data with self-supervision in the pre-training and fine-tuning paradigm. On the one hand, we apply continued pre-training and data replay techniques to mitigate the domain mismatch of the SSL pre-trained model. On the other hand, we propose a domain-adaptive fine-tuning approach based on the PL technique with three unique modifications: Firstly, we design a dual-branch PL method to decrease the sensitivity to the erroneous pseudo-labels; Secondly, we devise an uncertainty-aware confidence filtering strategy to improve pseudo-label correctness; Thirdly, we introduce a two-step PL approach to incorporate target domain linguistic knowledge, thus generating more accurate target domain pseudo-labels. Experimental results on various cross-domain scenarios demonstrate that the proposed approach effectively boosts the cross-domain performance and significantly outperforms previous approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题