论文标题

具有神经样条流的低维语音属性的生成建模

Generative Modeling for Low Dimensional Speech Attributes with Neural Spline Flows

论文作者

Shih, Kevin J., Valle, Rafael, Badlani, Rohan, Santos, João Felipe, Catanzaro, Bryan

论文摘要

尽管在文本到语音综合的生成建模方面取得了最新的进步,但这些模型尚未具有螺距条件的确定性模型(例如FastPitch和fastspeech2)的精细元素可调节性。音调信息不仅是低维度的,而且是不连续的,这使得在生成环境中建模特别困难。我们的工作探讨了在标准化流程模型的背景下处理上述问题的几种技术。我们还发现这个问题非常适合神经条件流,这是标准化流中更常见的仿射耦合机制的高度表达替代品。

Despite recent advances in generative modeling for text-to-speech synthesis, these models do not yet have the same fine-grained adjustability of pitch-conditioned deterministic models such as FastPitch and FastSpeech2. Pitch information is not only low-dimensional, but also discontinuous, making it particularly difficult to model in a generative setting. Our work explores several techniques for handling the aforementioned issues in the context of Normalizing Flow models. We also find this problem to be very well suited for Neural Spline flows, which is a highly expressive alternative to the more common affine-coupling mechanism in Normalizing Flows.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源