语音保护零射击多个重音转换

论文标题

语音保护零射击多个重音转换

Voice-preserving Zero-shot Multiple Accent Conversion

论文作者

Jin, Mumin, Serai, Prashant, Wu, Jilong, Tjandra, Andros, Manohar, Vimal, He, Qing

论文摘要

大多数试图学习外语的人会遇到困难的理解或与母语者的口音交谈。对于母语者来说，理解或说新口音也是一项艰巨的任务。一种重音转换系统，它改变了说话者的口音，但保留了说话者的语音身份（例如音色和音调），具有诸如沟通，语言学习和娱乐等一系列应用程序的潜力。现有的重音转换模型倾向于同时改变说话者的身份和口音。在这里，我们使用对抗性学习来解散依赖重音的特征，同时保留其他声学特征。与现有的重音转换模型不同的是，我们的作品与现有的口音转换模型不同，可以将看不见的说话者的话语转换为多种口音，同时保留其原始的语音身份。主观评估表明，我们的模型生成的音频听起来更接近目标口音，并且像原始扬声器一样。

Most people who have tried to learn a foreign language would have experienced difficulties understanding or speaking with a native speaker's accent. For native speakers, understanding or speaking a new accent is likewise a difficult task. An accent conversion system that changes a speaker's accent but preserves that speaker's voice identity, such as timbre and pitch, has the potential for a range of applications, such as communication, language learning, and entertainment. Existing accent conversion models tend to change the speaker identity and accent at the same time. Here, we use adversarial learning to disentangle accent dependent features while retaining other acoustic characteristics. What sets our work apart from existing accent conversion models is the capability to convert an unseen speaker's utterance to multiple accents while preserving its original voice identity. Subjective evaluations show that our model generates audio that sound closer to the target accent and like the original speaker.

下载PDF全文

下载文献需遵守相关版权规定

论文标题