扬声器扩展可以改善多扬声器的端到端TTS吗？

论文标题

扬声器扩展可以改善多扬声器的端到端TTS吗？

Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS?

论文作者

Cooper, Erica, Lai, Cheng-I, Yasuda, Yusuke, Yamagishi, Junichi

论文摘要

以前关于端到端语音综合的演讲者改编的工作仍然缺乏说话者的相似性。我们通过创建人工说话者并利用低质量的数据来研究当前说话者适应范式，说话者增强范围的正交方法。修改了基本Tacotron2模型以说明这些语料库中固有的渠道和方言因素。此外，我们描述了我们为Tacotron2培训采用的温暖启动培训策略。进行了大规模的听力测试，并采用了距离度量来评估方言的综合。接下来是对综合质量，说话者和方言相似性的分析，以及我们说话者增强方法的有效性的评论。音频样本可在线提供。

Previous work on speaker adaptation for end-to-end speech synthesis still falls short in speaker similarity. We investigate an orthogonal approach to the current speaker adaptation paradigms, speaker augmentation, by creating artificial speakers and by taking advantage of low-quality data. The base Tacotron2 model is modified to account for the channel and dialect factors inherent in these corpora. In addition, we describe a warm-start training strategy that we adopted for Tacotron2 training. A large-scale listening test is conducted, and a distance metric is adopted to evaluate synthesis of dialects. This is followed by an analysis on synthesis quality, speaker and dialect similarity, and a remark on the effectiveness of our speaker augmentation approach. Audio samples are available online.

下载PDF全文

下载文献需遵守相关版权规定

论文标题