论文标题
端到端双耳语音综合
End-to-End Binaural Speech Synthesis
论文作者
论文摘要
在这项工作中,我们提出了一个端到端双耳语音合成系统,该系统将低抑制音频编解码器与强大的双耳解码器结合在一起,该解码器能够准确地进行语音双耳化,同时忠实地重建环境因素,例如环境噪声或混响。该网络是经过修改的矢量定量自动编码器,训练有几个精心设计的目标,包括对抗性损失。我们在具有客观指标和感知研究的内部双耳数据集上评估了所提出的系统。结果表明,所提出的方法比以前的方法更接近地面真相数据。特别是,我们演示了对抗性损失在捕获创建真实听觉场景所需的环境效果中的能力。
In this work, we present an end-to-end binaural speech synthesis system that combines a low-bitrate audio codec with a powerful binaural decoder that is capable of accurate speech binauralization while faithfully reconstructing environmental factors like ambient noise or reverb. The network is a modified vector-quantized variational autoencoder, trained with several carefully designed objectives, including an adversarial loss. We evaluate the proposed system on an internal binaural dataset with objective metrics and a perceptual study. Results show that the proposed approach matches the ground truth data more closely than previous methods. In particular, we demonstrate the capability of the adversarial loss in capturing environment effects needed to create an authentic auditory scene.