论文标题
在转移学习中使用基于Hybrid Transformer-LSTM的端到端ASR利用文本数据
Leveraging Text Data Using Hybrid Transformer-LSTM Based End-to-End ASR in Transfer Learning
论文作者
论文摘要
在这项工作中,我们研究了利用额外的文本数据,以改善跨语性转移学习设置下的低资源端到端ASR。为此,我们扩展了先前的工作[1],并提出了一个基于混合变压器LSTM的架构。该体系结构不仅利用了变压器网络的高效编码能力,而且由于基于LSTM的独立语言模型网络而受益于额外的文本数据。我们对内部马来语料库进行实验,其中包含有限的标记数据和大量额外文本。结果表明,当使用有限的标记数据训练时,所提出的体系结构的表现优于先前的基于LSTM的体系结构[1],相对单词错误率(WER)24.2%。从此开始,我们通过从另一种资源丰富的语言中转移学习来获得25.4%的相对减少。此外,我们通过使用额外的文本数据来增强转移模型的LSTM解码器来获得额外的13.6%的相对减少。总体而言,我们的最佳模型比Vanilla Transformer ASR的表现相对11.9%。最后但并非最不重要的一点是,与LSTM和Transformer Architectures相比,提出的混合体系结构提供了更快的推断。
In this work, we study leveraging extra text data to improve low-resource end-to-end ASR under cross-lingual transfer learning setting. To this end, we extend our prior work [1], and propose a hybrid Transformer-LSTM based architecture. This architecture not only takes advantage of the highly effective encoding capacity of the Transformer network but also benefits from extra text data due to the LSTM-based independent language model network. We conduct experiments on our in-house Malay corpus which contains limited labeled data and a large amount of extra text. Results show that the proposed architecture outperforms the previous LSTM-based architecture [1] by 24.2% relative word error rate (WER) when both are trained using limited labeled data. Starting from this, we obtain further 25.4% relative WER reduction by transfer learning from another resource-rich language. Moreover, we obtain additional 13.6% relative WER reduction by boosting the LSTM decoder of the transferred model with the extra text data. Overall, our best model outperforms the vanilla Transformer ASR by 11.9% relative WER. Last but not least, the proposed hybrid architecture offers much faster inference compared to both LSTM and Transformer architectures.