论文标题

监督指导的代码手册,用于语音预训练中的蒙版预测

Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training

论文作者

Wang, Chengyi, Wang, Yiming, Wu, Yu, Chen, Sanyuan, Li, Jinyu, Liu, Shujie, Wei, Furu

论文摘要

最近,蒙面的预测预训练在自我监督学习(SSL)方面取得了显着的进展,以供语音识别。它通常需要以无监督的方式获得的代码手册,从而使其准确和难以解释。我们提出了两种监督指导的代码书生成方法,以提高自动语音识别(ASR)性能以及预培训效率,通过使用混合ASR系统解码来生成音素级别对齐(名为PBERT),或以从最终到End-End End CTC模型(名为CTC CTC ctc clct clisters)中提取的监督语音功能进行群集。混合动力和CTC模型均经过与微调相同的少量标记语音训练。实验表明,我们的方法对各种SSL和自我训练基线的显着优势,相对降低高达17.0%。我们的预训练模型在非ASR语音任务中也显示出良好的可传递性。

Recently, masked prediction pre-training has seen remarkable progress in self-supervised learning (SSL) for speech recognition. It usually requires a codebook obtained in an unsupervised way, making it less accurate and difficult to interpret. We propose two supervision-guided codebook generation approaches to improve automatic speech recognition (ASR) performance and also the pre-training efficiency, either through decoding with a hybrid ASR system to generate phoneme-level alignments (named PBERT), or performing clustering on the supervised speech features extracted from an end-to-end CTC model (named CTC clustering). Both the hybrid and CTC models are trained on the same small amount of labeled speech as used in fine-tuning. Experiments demonstrate significant superiority of our methods to various SSL and self-training baselines, with up to 17.0% relative WER reduction. Our pre-trained models also show good transferability in a non-ASR speech task.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源