ASR有偏见的自我监督学习

论文标题

ASR有偏见的自我监督学习

Biased Self-supervised learning for ASR

论文作者

Kreyssig, Florian L., Shi, Yangyang, Guo, Jinxi, Sari, Leda, Mohamed, Abdelrahman, Woodland, Philip C.

论文摘要

通过掩盖预测预训练（MPPT）的自我监督学习在一系列语音处理任务上表现出令人印象深刻的表现。本文提出了一种将自我监督学习偏向特定任务的方法。核心思想是稍微列出用于获得目标序列的模型。这会导致更好的性能和训练速度的大幅提高。此外，本文提出了MPPT的一种变体，该变体允许通过计算掩盖和未掩盖的框架上的MPPT损失来有效地训练低英寸的流媒体模型。评估了这些方法在Librispeech语料库上的自动语音识别，其中100小时的数据用作标记的数据，并将860小时用作未标记的数据。在250k更新后，偏见的培训优于公正的培训15.5％，在100k的测试更新后，培训的培训优于23.8％。对于流媒体模型，预训练方法的单词错误率降低了44.1％。

Self-supervised learning via masked prediction pre-training (MPPT) has shown impressive performance on a range of speech-processing tasks. This paper proposes a method to bias self-supervised learning towards a specific task. The core idea is to slightly finetune the model that is used to obtain the target sequence. This leads to better performance and a substantial increase in training speed. Furthermore, this paper proposes a variant of MPPT that allows low-footprint streaming models to be trained effectively by computing the MPPT loss on masked and unmasked frames. These approaches are evaluated for automatic speech recognition on the Librispeech corpus, where 100 hours of data served as the labelled data and 860 hours as the unlabelled data. The biased training outperforms the unbiased training by 15.5% after 250k updates and 23.8% after 100k updates on test-other. For the streaming models, the pre-training approach yields a reduction in word error rate of 44.1%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题