语言转换语音识别的特定特征特征帮助

论文标题

语言转换语音识别的特定特征特征帮助

Language-specific Characteristic Assistance for Code-switching Speech Recognition

论文作者

Song, Tongtong, Xu, Qiang, Ge, Meng, Wang, Longbiao, Shi, Hao, Lv, Yongjie, Lin, Yuqin, Dang, Jianwu

论文摘要

双重编码器结构成功地利用了两个特定语言的编码器（LSE）进行代码转换语音识别。由于LSE由两个预训练的特定语言模型（LSM）初始化，因此双编码器结构可以利用足够的单语言数据并捕获单个语言属性。但是，大多数现有方法对LSE的语言没有限制，并且不足以特定于LSM的语言知识。在本文中，我们提出了一种特定语言的特征援助（LSCA）方法来减轻上述问题。具体来说，在培训期间，我们引入了两种特定语言的损失作为语言限制，并为其产生相应的语言目标。在解码过程中，我们通过组合两个LSM和混合模型的输出概率来考虑LSM的解码能力，以获得最终预测。实验表明，LSCA的训练或解码方法可以改善模型的性能。此外，通过组合LSCA的训练和解码方法，最佳结果可以在代码切换测试集上获得多达15.4％的相对误差。此外，该系统可以通过使用我们的方法来很好地处理代码转换语音识别任务，而无需额外的共享参数，甚至可以基于两个预训练的LSM进行重新训练。

Dual-encoder structure successfully utilizes two language-specific encoders (LSEs) for code-switching speech recognition. Because LSEs are initialized by two pre-trained language-specific models (LSMs), the dual-encoder structure can exploit sufficient monolingual data and capture the individual language attributes. However, most existing methods have no language constraints on LSEs and underutilize language-specific knowledge of LSMs. In this paper, we propose a language-specific characteristic assistance (LSCA) method to mitigate the above problems. Specifically, during training, we introduce two language-specific losses as language constraints and generate corresponding language-specific targets for them. During decoding, we take the decoding abilities of LSMs into account by combining the output probabilities of two LSMs and the mixture model to obtain the final predictions. Experiments show that either the training or decoding method of LSCA can improve the model's performance. Furthermore, the best result can obtain up to 15.4% relative error reduction on the code-switching test set by combining the training and decoding methods of LSCA. Moreover, the system can process code-switching speech recognition tasks well without extra shared parameters or even retraining based on two pre-trained LSMs by using our method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题