不要那么密集：稀疏到较小的gan训练而不牺牲表现

论文标题

不要那么密集：稀疏到较小的gan训练而不牺牲表现

Don't Be So Dense: Sparse-to-Sparse GAN Training Without Sacrificing Performance

论文作者

Liu, Shiwei, Tian, Yuesong, Chen, Tianlong, Shen, Li

论文摘要

由于生成的数据的高质量，生成的对抗网络（GAN）已受到提出的兴趣。在取得越来越令人印象深刻的结果的同时，与大型型号相关的资源需求阻碍了在资源有限的方案中gan的使用。为了推断，现有的模型压缩技术可以通过可比性能降低模型复杂性。但是，由于gan的脆弱训练过程，gan的训练效率较少。在本文中，我们第一次探讨了直接从头开始训练稀疏gan的可能性，而无需涉及任何密集或训练的步骤。甚至在不兼容的情况下，我们提出的方法可以直接训练稀疏的不平衡剂，并从头开始稀疏发电机。我们没有从稀疏的gan开始，而不是训练完整的gan，然后动态探索在整个训练过程中跨越发电机上跨越的参数空间。这样的稀疏到SPARSE培训程序可以逐步提高高度稀疏发电机的能力，同时坚持使用固定的小参数预算，并具有吸引人的培训和推理效率提高。对现代GAN体系结构进行的广泛实验验证了我们方法的有效性。我们的稀疏甘斯（Gans）一次跑步训练，可以超越昂贵的迭代修剪和重新训练而胜过那些人。也许最重要的是，我们发现而不是从昂贵的预训练的甘斯继承参数，而是直接从头开始训练稀疏的甘甘可能是一个更有效的解决方案。例如，只有使用80％稀疏发生器和70％稀疏歧视器的训练，我们的方法比密集的Biggan可以取得更好的性能。

Generative adversarial networks (GANs) have received an upsurging interest since being proposed due to the high quality of the generated data. While achieving increasingly impressive results, the resource demands associated with the large model size hinders the usage of GANs in resource-limited scenarios. For inference, the existing model compression techniques can reduce the model complexity with comparable performance. However, the training efficiency of GANs has less been explored due to the fragile training process of GANs. In this paper, we, for the first time, explore the possibility of directly training sparse GAN from scratch without involving any dense or pre-training steps. Even more unconventionally, our proposed method enables directly training sparse unbalanced GANs with an extremely sparse generator from scratch. Instead of training full GANs, we start with sparse GANs and dynamically explore the parameter space spanned over the generator throughout training. Such a sparse-to-sparse training procedure enhances the capacity of the highly sparse generator progressively while sticking to a fixed small parameter budget with appealing training and inference efficiency gains. Extensive experiments with modern GAN architectures validate the effectiveness of our method. Our sparsified GANs, trained from scratch in one single run, are able to outperform the ones learned by expensive iterative pruning and re-training. Perhaps most importantly, we find instead of inheriting parameters from expensive pre-trained GANs, directly training sparse GANs from scratch can be a much more efficient solution. For example, only training with a 80% sparse generator and a 70% sparse discriminator, our method can achieve even better performance than the dense BigGAN.

下载PDF全文

下载文献需遵守相关版权规定

论文标题