论文标题
DRVC:一个通过自我监督学习的任何一对一语音转换的框架
DRVC: A Framework of Any-to-Any Voice Conversion with Self-Supervised Learning
论文作者
论文摘要
任何一对一的语音转换问题都旨在转换来自培训数据的源和目标扬声器的声音。以前的作品疯狂地利用了基于DISENTANGLE的模型。基于DISENTANGLE的模型假定语音由内容和说话者样式信息组成,并旨在解开它们以更改样式信息以进行转换。以前的作品着重于减少语音维度以获取内容信息。但是,很难确定大小导致解开重叠问题。我们提出了分离的表示语音转换(DRVC)模型来解决该问题。 DRVC模型是由内容编码器,Timbre Endoter和Generator组成的端到端自我监督模型。我们提出了一个循环,以限制循环重建损失和相同的损失,而不是以前的工作来减少语音规模以获取内容。实验表明,关于质量和语音相似性的转换语音的改进。
Any-to-any voice conversion problem aims to convert voices for source and target speakers, which are out of the training data. Previous works wildly utilize the disentangle-based models. The disentangle-based model assumes the speech consists of content and speaker style information and aims to untangle them to change the style information for conversion. Previous works focus on reducing the dimension of speech to get the content information. But the size is hard to determine to lead to the untangle overlapping problem. We propose the Disentangled Representation Voice Conversion (DRVC) model to address the issue. DRVC model is an end-to-end self-supervised model consisting of the content encoder, timbre encoder, and generator. Instead of the previous work for reducing speech size to get content, we propose a cycle for restricting the disentanglement by the Cycle Reconstruct Loss and Same Loss. The experiments show there is an improvement for converted speech on quality and voice similarity.