在转移学习中使用基于Hybrid Transformer-LSTM的端到端ASR利用文本数据

论文标题

在转移学习中使用基于Hybrid Transformer-LSTM的端到端ASR利用文本数据

Leveraging Text Data Using Hybrid Transformer-LSTM Based End-to-End ASR in Transfer Learning

论文作者

Zeng, Zhiping, Pham, Van Tung, Xu, Haihua, Khassanov, Yerbolat, Chng, Eng Siong, Ni, Chongjia, Ma, Bin

论文摘要

在这项工作中，我们研究了利用额外的文本数据，以改善跨语性转移学习设置下的低资源端到端ASR。为此，我们扩展了先前的工作[1]，并提出了一个基于混合变压器LSTM的架构。该体系结构不仅利用了变压器网络的高效编码能力，而且由于基于LSTM的独立语言模型网络而受益于额外的文本数据。我们对内部马来语料库进行实验，其中包含有限的标记数据和大量额外文本。结果表明，当使用有限的标记数据训练时，所提出的体系结构的表现优于先前的基于LSTM的体系结构[1]，相对单词错误率（WER）24.2％。从此开始，我们通过从另一种资源丰富的语言中转移学习来获得25.4％的相对减少。此外，我们通过使用额外的文本数据来增强转移模型的LSTM解码器来获得额外的13.6％的相对减少。总体而言，我们的最佳模型比Vanilla Transformer ASR的表现相对11.9％。最后但并非最不重要的一点是，与LSTM和Transformer Architectures相比，提出的混合体系结构提供了更快的推断。

In this work, we study leveraging extra text data to improve low-resource end-to-end ASR under cross-lingual transfer learning setting. To this end, we extend our prior work [1], and propose a hybrid Transformer-LSTM based architecture. This architecture not only takes advantage of the highly effective encoding capacity of the Transformer network but also benefits from extra text data due to the LSTM-based independent language model network. We conduct experiments on our in-house Malay corpus which contains limited labeled data and a large amount of extra text. Results show that the proposed architecture outperforms the previous LSTM-based architecture [1] by 24.2% relative word error rate (WER) when both are trained using limited labeled data. Starting from this, we obtain further 25.4% relative WER reduction by transfer learning from another resource-rich language. Moreover, we obtain additional 13.6% relative WER reduction by boosting the LSTM decoder of the transferred model with the extra text data. Overall, our best model outperforms the vanilla Transformer ASR by 11.9% relative WER. Last but not least, the proposed hybrid architecture offers much faster inference compared to both LSTM and Transformer architectures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题