关于ISCSLP 2022中文 - 英语密码转换ASR挑战的摘要

论文标题

关于ISCSLP 2022中文 - 英语密码转换ASR挑战的摘要

Summary on the ISCSLP 2022 Chinese-English Code-Switching ASR Challenge

论文作者

Deng, Shuhao, Li, Chengfei, Bai, Jinfeng, Zhang, Qingqing, Zhang, Wei-Qiang, Yang, Runyan, Cheng, Gaofeng, Zhang, Pengyuan, Yan, Yonghong

论文摘要

由于多语言语言和日常生活中代码转换现象经常出现，代码转换自动语音识别成为自动语音识别的最具挑战性和最有价值的场景之一。 ISCSLP 2022中文 - 英语密码转换自动语音识别（CSASR）挑战旨在促进代码转换自动语音识别的开发。 ISCSLP 2022 CSASR挑战提供了两次培训集，TAL_CSASR语料库和MagicData-Ramc Corpus，是针对参与者的开发和测试集，用于CSASR模型培训和评估。除挑战外，我们还提供了基线系统性能供参考。结果，有40多个团队参与了这一挑战，获胜者团队在测试集上达到了16.70％的混合错误率（MER）性能，并且与基线系统相比，MER绝对的绝对改善达到了9.8％。在本文中，我们将描述数据集，关联的基线系统和需求，并总结CSASR挑战结果以及提交系统中使用的主要技术和技巧。

Code-switching automatic speech recognition becomes one of the most challenging and the most valuable scenarios of automatic speech recognition, due to the code-switching phenomenon between multilingual language and the frequent occurrence of code-switching phenomenon in daily life. The ISCSLP 2022 Chinese-English Code-Switching Automatic Speech Recognition (CSASR) Challenge aims to promote the development of code-switching automatic speech recognition. The ISCSLP 2022 CSASR challenge provided two training sets, TAL_CSASR corpus and MagicData-RAMC corpus, a development and a test set for participants, which are used for CSASR model training and evaluation. Along with the challenge, we also provide the baseline system performance for reference. As a result, more than 40 teams participated in this challenge, and the winner team achieved 16.70% Mixture Error Rate (MER) performance on the test set and has achieved 9.8% MER absolute improvement compared with the baseline system. In this paper, we will describe the datasets, the associated baselines system and the requirements, and summarize the CSASR challenge results and major techniques and tricks used in the submitted systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题