通过相互信息估计和最大化的多模式图像到图像翻译

论文标题

通过相互信息估计和最大化的多模式图像到图像翻译

Multimodal Image-to-Image Translation via Mutual Information Estimation and Maximization

论文作者

Zuo, Zhiwen, Zhao, Lei, Wang, Zhizhong, Chen, Haibo, Li, Ailin, Xu, Qijiang, Xing, Wei, Lu, Dongming

论文摘要

多模式图像到图像翻译（I2IT）旨在学习一个条件分布，该分布在源域中探索目标域中的多个可能的图像。有条件的生成对抗网络（CGAN）通常用于建模这种条件分布。但是，CGAN容易忽略潜在代码并学习条件图像合成中的单峰分布，这也称为GAN的模式崩溃问题。为了解决该问题，我们提出了一种简单而有效的方法，该方法通过使用本文中的深层共同信息神经估计器来明确估计和最大化CGAN中的潜在代码和输出图像之间的相互信息。最大化相互信息可以增强潜在代码和输出图像之间的统计依赖性，这阻止了发电机忽略潜在代码，并鼓励CGAN充分利用潜在代码来综合各种结果。我们的方法不仅提供了信息理论的新观点，可以改善i2IT的多样性，而且还可以免费获得源域内容与目标域样式之间的分离。

Multimodal image-to-image translation (I2IT) aims to learn a conditional distribution that explores multiple possible images in the target domain given an input image in the source domain. Conditional generative adversarial networks (cGANs) are often adopted for modeling such a conditional distribution. However, cGANs are prone to ignore the latent code and learn a unimodal distribution in conditional image synthesis, which is also known as the mode collapse issue of GANs. To solve the problem, we propose a simple yet effective method that explicitly estimates and maximizes the mutual information between the latent code and the output image in cGANs by using a deep mutual information neural estimator in this paper. Maximizing the mutual information strengthens the statistical dependency between the latent code and the output image, which prevents the generator from ignoring the latent code and encourages cGANs to fully utilize the latent code for synthesizing diverse results. Our method not only provides a new perspective from information theory to improve diversity for I2IT but also achieves disentanglement between the source domain content and the target domain style for free.

下载PDF全文

下载文献需遵守相关版权规定

论文标题