论文标题

在教师学生框架中,数据启动的跨语言综合

Data-augmented cross-lingual synthesis in a teacher-student framework

论文作者

de Korte, Marcel, Kim, Jaebok, Kunikoshi, Aki, Adigwe, Adaeze, Klabbers, Esther

论文摘要

跨语性合成可以定义为让说话者以另一种语言产生流利的合成语音的任务。这是一项具有挑战性的任务,由此产生的言语可能会因自然性,强调语音和/或失去基本语音特征而受到影响。先前的研究表明,许多模型似乎没有足够的概括能力无法在这些跨语义方面表现出色。为了克服这些概括问题,我们建议将教师范式应用于跨语言综合。虽然教师模型通常用于制作教师强迫数据,但我们也建议使用它来产生不看者的语言对的增强数据,其目的是保留基本的扬声器特征。然后,两组数据都用于学生模型培训,该培训经过培训,以保留教师强迫数据中存在的自然性和韵律变化,同时从增强数据中学习说话者身份。提出了对学生模型的一些修改,以使教师强迫和增强数据的分离更加简单。结果表明,所提出的方法改善了演讲中说话者特征的保留,同时设法保留了高水平的自然性和韵律变化。

Cross-lingual synthesis can be defined as the task of letting a speaker generate fluent synthetic speech in another language. This is a challenging task, and resulting speech can suffer from reduced naturalness, accented speech, and/or loss of essential voice characteristics. Previous research shows that many models appear to have insufficient generalization capabilities to perform well on every of these cross-lingual aspects. To overcome these generalization problems, we propose to apply the teacher-student paradigm to cross-lingual synthesis. While a teacher model is commonly used to produce teacher forced data, we propose to also use it to produce augmented data of unseen speaker-language pairs, where the aim is to retain essential speaker characteristics. Both sets of data are then used for student model training, which is trained to retain the naturalness and prosodic variation present in the teacher forced data, while learning the speaker identity from the augmented data. Some modifications to the student model are proposed to make the separation of teacher forced and augmented data more straightforward. Results show that the proposed approach improves the retention of speaker characteristics in the speech, while managing to retain high levels of naturalness and prosodic variation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源