论文标题
适应和调整:克服多语言语音识别的长尾问题
Adapt-and-Adjust: Overcoming the Long-Tail Problem of Multilingual Speech Recognition
论文作者
论文摘要
真实世界多语言识别的一个至关重要的挑战是长尾分发问题,其中一些资源丰富的语言(例如英语)具有丰富的培训数据,但是长长的低资源语言的尾巴有不同量的有限培训数据。为了克服长尾问题,在本文中,我们提出了适应和调整(A2),这是一个基于变压器的多任务学习框架,用于端到端多语言语音识别。 A2框架通过三种技术克服了长尾问题:(1)利用预处理的多语言模型(Mbert)来提高低资源语言的性能; (2)提出由语言特定和语言敏捷适应的双重适配器,以及最少的其他参数; (3)克服班级失衡,要么通过在训练期间施加级别的先验,要么在推理过程中调整软玛克斯输出的逻辑。公共副语料库的广泛实验表明,A2的表现明显优于常规方法。
One crucial challenge of real-world multilingual speech recognition is the long-tailed distribution problem, where some resource-rich languages like English have abundant training data, but a long tail of low-resource languages have varying amounts of limited training data. To overcome the long-tail problem, in this paper, we propose Adapt-and-Adjust (A2), a transformer-based multi-task learning framework for end-to-end multilingual speech recognition. The A2 framework overcomes the long-tail problem via three techniques: (1) exploiting a pretrained multilingual language model (mBERT) to improve the performance of low-resource languages; (2) proposing dual adapters consisting of both language-specific and language-agnostic adaptation with minimal additional parameters; and (3) overcoming the class imbalance, either by imposing class priors in the loss during training or adjusting the logits of the softmax output during inference. Extensive experiments on the CommonVoice corpus show that A2 significantly outperforms conventional approaches.