从舞蹈视频中量化了复杂音乐的gan

论文标题

从舞蹈视频中量化了复杂音乐的gan

Quantized GAN for Complex Music Generation from Dance Videos

论文作者

Zhu, Ye, Olszewski, Kyle, Wu, Yu, Achlioptas, Panos, Chai, Menglei, Yan, Yan, Tulyakov, Sergey

论文摘要

我们提出了Dance2Music-Gan（D2M-GAN），这是一种新颖的对抗性多模式框架，生成了以舞蹈视频为条件的复杂音乐样品。我们提出的框架将舞蹈视频框架和人体运动作为输入，并学会生成合理伴随相应输入的音乐样本。与大多数现有的有条件的音乐生成作品不同，它们使用符号音频表示（例如MIDI）生成特定类型的单乐器声音，并且通常依赖于预定的音乐合成器，在这项工作中，我们以复杂的风格（例如，流行，breaking等）来产生舞蹈音乐，并通过量化（vq）量化（vq）的效果（vq）的效果（vq），以及范围的效率（vq），并使用其范围的效果。同行。通过在多个数据集上执行一系列广泛的实验，并按照全面的评估协议，我们评估了建议针对替代方案的生成品质。所获得的定量结果衡量音乐一致性，击败了对应关系，音乐多样性证明了我们提出的方法的有效性。最后但并非最不重要的一点是，我们策划了一个充满挑战的野生式Tiktok视频的舞蹈音乐数据集，我们用来进一步证明我们的方法在现实世界应用中的功效 - 我们希望成为相关的未来研究的起点。

We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates complex musical samples conditioned on dance videos. Our proposed framework takes dance video frames and human body motions as input, and learns to generate music samples that plausibly accompany the corresponding input. Unlike most existing conditional music generation works that generate specific types of mono-instrumental sounds using symbolic audio representations (e.g., MIDI), and that usually rely on pre-defined musical synthesizers, in this work we generate dance music in complex styles (e.g., pop, breaking, etc.) by employing a Vector Quantized (VQ) audio representation, and leverage both its generality and high abstraction capacity of its symbolic and continuous counterparts. By performing an extensive set of experiments on multiple datasets, and following a comprehensive evaluation protocol, we assess the generative qualities of our proposal against alternatives. The attained quantitative results, which measure the music consistency, beats correspondence, and music diversity, demonstrate the effectiveness of our proposed method. Last but not least, we curate a challenging dance-music dataset of in-the-wild TikTok videos, which we use to further demonstrate the efficacy of our approach in real-world applications -- and which we hope to serve as a starting point for relevant future research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题