减少用令牌级语言诊断语言识别的语言混乱

论文标题

减少用令牌级语言诊断语言识别的语言混乱

Reducing Language confusion for Code-switching Speech Recognition with Token-level Language Diarization

论文作者

Liu, Hexin, Xu, Haihua, Garcia, Leibny Paola, Khong, Andy W. H., He, Yi, Khudanpur, Sanjeev

论文摘要

代码转换（CS）是指语言在语音信号中切换并导致语言混乱的现象，以使自动语音识别（ASR）。本文旨在解决语言混乱，以从两个角度来改善CS-ASR：并入和解开语言信息。我们通过用令牌级的语言后代动态偏向模型，将语言信息整合到CS-ASR模型中，这些语言是序列到序列辅助语言诊断模块的输出。相反，解开过程通过对抗训练减少了语言之间的差异，以使两种语言正常化。我们在接缝数据集上进行实验。与基线模型相比，LD的关节优化和语言后偏置都可以提高性能。提出的方法的比较表明，合并语言信息比删除CS语音中的语言混乱更有效。

Code-switching (CS) refers to the phenomenon that languages switch within a speech signal and leads to language confusion for automatic speech recognition (ASR). This paper aims to address language confusion for improving CS-ASR from two perspectives: incorporating and disentangling language information. We incorporate language information in the CS-ASR model by dynamically biasing the model with token-level language posteriors which are outputs of a sequence-to-sequence auxiliary language diarization module. In contrast, the disentangling process reduces the difference between languages via adversarial training so as to normalize two languages. We conduct the experiments on the SEAME dataset. Compared to the baseline model, both the joint optimization with LD and the language posterior bias achieve performance improvement. The comparison of the proposed methods indicates that incorporating language information is more effective than disentangling for reducing language confusion in CS speech.

下载PDF全文

下载文献需遵守相关版权规定

论文标题