监督指导的代码手册，用于语音预训练中的蒙版预测

论文标题

监督指导的代码手册，用于语音预训练中的蒙版预测

Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training

论文作者

Wang, Chengyi, Wang, Yiming, Wu, Yu, Chen, Sanyuan, Li, Jinyu, Liu, Shujie, Wei, Furu

论文摘要

最近，蒙面的预测预训练在自我监督学习（SSL）方面取得了显着的进展，以供语音识别。它通常需要以无监督的方式获得的代码手册，从而使其准确和难以解释。我们提出了两种监督指导的代码书生成方法，以提高自动语音识别（ASR）性能以及预培训效率，通过使用混合ASR系统解码来生成音素级别对齐（名为PBERT），或以从最终到End-End End CTC模型（名为CTC CTC ctc clct clisters）中提取的监督语音功能进行群集。混合动力和CTC模型均经过与微调相同的少量标记语音训练。实验表明，我们的方法对各种SSL和自我训练基线的显着优势，相对降低高达17.0％。我们的预训练模型在非ASR语音任务中也显示出良好的可传递性。

Recently, masked prediction pre-training has seen remarkable progress in self-supervised learning (SSL) for speech recognition. It usually requires a codebook obtained in an unsupervised way, making it less accurate and difficult to interpret. We propose two supervision-guided codebook generation approaches to improve automatic speech recognition (ASR) performance and also the pre-training efficiency, either through decoding with a hybrid ASR system to generate phoneme-level alignments (named PBERT), or performing clustering on the supervised speech features extracted from an end-to-end CTC model (named CTC clustering). Both the hybrid and CTC models are trained on the same small amount of labeled speech as used in fine-tuning. Experiments demonstrate significant superiority of our methods to various SSL and self-training baselines, with up to 17.0% relative WER reduction. Our pre-trained models also show good transferability in a non-ASR speech task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题