论文标题
扩大审议多语言ASR的审议
Scaling Up Deliberation for Multilingual ASR
论文作者
论文摘要
多语言端到端自动语音识别模型由于其在培训和部署方面的简单性而具有吸引力。与单语模型相比,有关此类模型的大规模培训的最新研究表现出了令人鼓舞的结果。但是,这项工作通常集中在单通道设置中的多语言模型上。在这项工作中,我们研究了对多语言语音识别的第二次审议。我们提出的审议是多语言的,即文本编码器编码来自多种语言的假设文本,解码器会参与多语言文本和音频。我们研究了缩放审议文本编码器和解码器的规模,并比较缩放审议解码器和级联编码器的缩放。我们表明,与单通行模型相比,审议将9种语言的平均含量提高了4%。通过将审议的规模提高到1B参数,平均提高的平均改善增加到9%,某些语言的平均改善最高为14%。我们的审议委员是基于变压器层,可以在撤退期间并行。
Multilingual end-to-end automatic speech recognition models are attractive due to its simplicity in training and deployment. Recent work on large-scale training of such models has shown promising results compared to monolingual models. However, the work often focuses on multilingual models themselves in a single-pass setup. In this work, we investigate second-pass deliberation for multilingual speech recognition. Our proposed deliberation is multilingual, i.e., the text encoder encodes hypothesis text from multiple languages, and the decoder attends to multilingual text and audio. We investigate scaling the deliberation text encoder and decoder, and compare scaling the deliberation decoder and the first-pass cascaded encoder. We show that deliberation improves the average WER on 9 languages by 4% relative compared to the single-pass model. By increasing the size of the deliberation up to 1B parameters, the average WER improvement increases to 9%, with up to 14% for certain languages. Our deliberation rescorer is based on transformer layers and can be parallelized during rescoring.