论文标题
在歌曲中学习美丽:神经唱歌的语音美化器
Learning the Beauty in Songs: Neural Singing Voice Beautifier
论文作者
论文摘要
我们对一项新颖的任务感兴趣,唱歌的语音美化(SVB)。鉴于业余歌手的歌声,SVB旨在改善声音的语调和声音,同时保持内容和声音音色。当前的自动音高校正技术不成熟,其中大多数仅限于语调,但忽略了整体美学质量。因此,我们介绍了神经唱歌语音美化器(NSVB),这是第一个解决SVB任务的生成模型,该模型采用有条件的变异自动编码器作为骨干,并了解声音的潜在表示。在NSVB中,我们提出了一种新型的螺距校正时间巡游方法:形状感知的动态时间扭曲(SADTW),它可以改善现有时盘方法的鲁棒性,以将业余记录与模板俯仰曲线同步。此外,我们在潜在空间中提出了一种潜在映射算法,以将业余声音转换为专业人士。为了实现这一目标,我们还提出了一个新的数据集,其中包含业余和专业版本的平行歌唱录音。对中文和英语歌曲的广泛实验证明了我们方法在客观和主观指标方面的有效性。音频样本可在〜\ url {https://neuralsvb.github.io}上找到。代码:\ url {https://github.com/moonintheriver/neuralsvb}。
We are interested in a novel task, singing voice beautifying (SVB). Given the singing voice of an amateur singer, SVB aims to improve the intonation and vocal tone of the voice, while keeping the content and vocal timbre. Current automatic pitch correction techniques are immature, and most of them are restricted to intonation but ignore the overall aesthetic quality. Hence, we introduce Neural Singing Voice Beautifier (NSVB), the first generative model to solve the SVB task, which adopts a conditional variational autoencoder as the backbone and learns the latent representations of vocal tone. In NSVB, we propose a novel time-warping approach for pitch correction: Shape-Aware Dynamic Time Warping (SADTW), which ameliorates the robustness of existing time-warping approaches, to synchronize the amateur recording with the template pitch curve. Furthermore, we propose a latent-mapping algorithm in the latent space to convert the amateur vocal tone to the professional one. To achieve this, we also propose a new dataset containing parallel singing recordings of both amateur and professional versions. Extensive experiments on both Chinese and English songs demonstrate the effectiveness of our methods in terms of both objective and subjective metrics. Audio samples are available at~\url{https://neuralsvb.github.io}. Codes: \url{https://github.com/MoonInTheRiver/NeuralSVB}.