论文标题
使用对抗性多任务学习的儿童声学建模的扬声器和年龄不变培训
Speaker- and Age-Invariant Training for Child Acoustic Modeling Using Adversarial Multi-Task Learning
论文作者
论文摘要
儿童言语声音建模的主要挑战之一是儿童表达者长大时会发生快速变化,其增长率不同以及同一年龄段的随后的高变异性。这些高声学变化以及儿童言语语料库的稀缺性阻碍了儿童可靠的言语识别系统的发展。在本文中,提出了基于对抗性多任务学习的扬声器和年龄不变培训方法。该系统由一个生成器共享网络组成,该网络学会生成与三个歧视网络相连的说话者和年龄不变的功能,用于音素,年龄和扬声器。对发电机网络进行了训练,以最大程度地减少音素歧视损失,并以对抗性的多任务学习方式最大化说话者和年龄歧视损失。发电机网络是时间延迟神经网络(TDNN)体系结构,而三个歧视器是馈送前向网络。该系统已应用于OGI语音语料库,并使ASR的WER降低了13%。
One of the major challenges in acoustic modelling of child speech is the rapid changes that occur in the children's articulators as they grow up, their differing growth rates and the subsequent high variability in the same age group. These high acoustic variations along with the scarcity of child speech corpora have impeded the development of a reliable speech recognition system for children. In this paper, a speaker- and age-invariant training approach based on adversarial multi-task learning is proposed. The system consists of one generator shared network that learns to generate speaker- and age-invariant features connected to three discrimination networks, for phoneme, age, and speaker. The generator network is trained to minimize the phoneme-discrimination loss and maximize the speaker- and age-discrimination losses in an adversarial multi-task learning fashion. The generator network is a Time Delay Neural Network (TDNN) architecture while the three discriminators are feed-forward networks. The system was applied to the OGI speech corpora and achieved a 13% reduction in the WER of the ASR.