改进的自我监管的多语言语音表示学习与辅助语言信息相结合

论文标题

改进的自我监管的多语言语音表示学习与辅助语言信息相结合

Improved Self-Supervised Multilingual Speech Representation Learning Combined with Auxiliary Language Information

论文作者

Ding, Fenglin, Wan, Genshun, Li, Pengcheng, Pan, Jia, Liu, Cong

论文摘要

多语言端到端模型对单语系统显示出很大的改进。随着对语音的训练方法的发展，诸如XLSR之类的自我监督的多语言语音表示学习在改善多语言自动语音识别（ASR）的表现方面已经成功。但是，与监督学习类似，多语言预训练也可能遭受语言干扰，并进一步影响多语言系统的应用。在本文中，我们介绍了几种通过利用辅助语言信息来改善自我监督的多语言预训练的技术，包括在训练前训练期间的语言对抗性培训，语言嵌入和语言自适应培训。我们对由16种语言组成的多语言ASR任务进行实验。我们的实验结果表明，与标准XLSR模型相对增长14.3％，而无需预训练的多语言模型，相对增益为19.8％。

Multilingual end-to-end models have shown great improvement over monolingual systems. With the development of pre-training methods on speech, self-supervised multilingual speech representation learning like XLSR has shown success in improving the performance of multilingual automatic speech recognition (ASR). However, similar to the supervised learning, multilingual pre-training may also suffer from language interference and further affect the application of multilingual system. In this paper, we introduce several techniques for improving self-supervised multilingual pre-training by leveraging auxiliary language information, including the language adversarial training, language embedding and language adaptive training during the pre-training stage. We conduct experiments on a multilingual ASR task consisting of 16 languages. Our experimental results demonstrate 14.3% relative gain over the standard XLSR model, and 19.8% relative gain over the no pre-training multilingual model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题