在音频域循环生成中利用用于生成对抗网络的预训练的特征网络

论文标题

在音频域循环生成中利用用于生成对抗网络的预训练的特征网络

Exploiting Pre-trained Feature Networks for Generative Adversarial Networks in Audio-domain Loop Generation

论文作者

Yeh, Yen-Tung, Chen, Bo-Yu, Yang, Yi-Hsuan

论文摘要

尽管生成的对抗网络（GAN）已被广泛用于音频生成研究中，但已知GAN模型的培训是不稳定，耗时和数据效率低下的。在改善gan的训练过程的尝试中，预计甘恩的想法是基于gan的图像生成的有效解决方案，在不同的图像应用中建立了最新的图像。核心思想是使用预训练的分类器来限制判别器的特征空间来稳定和改善GAN训练。本文通过评估基于stylegan2的音频域循环生成模型的性能，在鉴别器中使用预训练的特征空间，研究了预计的GAN是否可以类似地改善音频产生。此外，我们比较使用一般性和域特异性分类器作为预训练的音频分类器的性能。借助鼓循环和合成器循环生成的实验，我们表明一般音频分类器的效果更好，并且通过预测的gan，我们的循环生成模型可以更快地收敛约5倍，而不会降解性能。

While generative adversarial networks (GANs) have been widely used in research on audio generation, the training of a GAN model is known to be unstable, time consuming, and data inefficient. Among the attempts to ameliorate the training process of GANs, the idea of Projected GAN emerges as an effective solution for GAN-based image generation, establishing the state-of-the-art in different image applications. The core idea is to use a pre-trained classifier to constrain the feature space of the discriminator to stabilize and improve GAN training. This paper investigates whether Projected GAN can similarly improve audio generation, by evaluating the performance of a StyleGAN2-based audio-domain loop generation model with and without using a pre-trained feature space in the discriminator. Moreover, we compare the performance of using a general versus domain-specific classifier as the pre-trained audio classifier. With experiments on both drum loop and synth loop generation, we show that a general audio classifier works better, and that with Projected GAN our loop generation models can converge around 5 times faster without performance degradation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题