通过可区分的面具和共同注意蒸馏学习有效的图像翻译剂

论文标题

通过可区分的面具和共同注意蒸馏学习有效的图像翻译剂

Learning Efficient GANs for Image Translation via Differentiable Masks and co-Attention Distillation

论文作者

Li, Shaojie, Lin, Mingbao, Wang, Yan, Chao, Fei, Shao, Ling, Ji, Rongrong

论文摘要

生成对抗网络（GAN）在图像翻译中已广泛使用，但是它们的高计算和存储成本阻碍了移动设备上的部署。由于GAN任务的特殊性和不稳定的对抗训练，CNN压缩的普遍方法不能直接应用于gan。在本文中，我们通过提出可区分的遮罩和共同注意蒸馏提出了一种称为DMAD的新型GAN压缩方法。前者以训练自适应的方式寻找轻巧的生成器建筑。为了克服通道不一致时，在修剪残差连接时，进一步纳入了自适应的跨块组稀疏性。后者同时将预先训练模型的生成器和歧视器均蒸馏到搜索的发电机上，从而有效地稳定了对我们轻重量模型的对抗性训练。实验表明，DMAD可以将Cyclegan的乘积累积操作（MAC）降低13倍，而Pix2Pix的乘积可以减少4倍，同时与完整模型保持了可比的性能。我们的代码可以在https://github.com/sjleo/dmad上找到。

Generative Adversarial Networks (GANs) have been widely-used in image translation, but their high computation and storage costs impede the deployment on mobile devices. Prevalent methods for CNN compression cannot be directly applied to GANs due to the peculiarties of GAN tasks and the unstable adversarial training. To solve these, in this paper, we introduce a novel GAN compression method, termed DMAD, by proposing a Differentiable Mask and a co-Attention Distillation. The former searches for a light-weight generator architecture in a training-adaptive manner. To overcome channel inconsistency when pruning the residual connections, an adaptive cross-block group sparsity is further incorporated. The latter simultaneously distills informative attention maps from both the generator and discriminator of a pre-trained model to the searched generator, effectively stabilizing the adversarial training of our light-weight model. Experiments show that DMAD can reduce the Multiply Accumulate Operations (MACs) of CycleGAN by 13x and that of Pix2Pix by 4x while retaining a comparable performance against the full model. Our code can be available at https://github.com/SJLeo/DMAD.

下载PDF全文

下载文献需遵守相关版权规定

论文标题