论文标题
优化的潜在代码选择,用于解释有条件的文本对图像gan
Optimized latent-code selection for explainable conditional text-to-image GANs
论文作者
论文摘要
由于条件生成对抗网络(GAN)的进步,文本到图像生成的任务取得了显着的进步。但是,现有的条件文本对图像甘恩(Gans)的方法主要集中在提高图像质量和语义相关性,但忽略了该模型的解释性,该模型在现实世界应用中起着至关重要的作用。在本文中,我们提出了各种技术,可以深入了解条件文本到图像gans模型的潜在空间和语义空间。我们介绍了潜在代码和“语言”线性插值的成对线性插值,以研究模型在潜在空间和“语言”嵌入中学到的知识。随后,我们将线性插值扩展到三角插值的三角插值,以进一步分析模型。之后,我们构建了一个好/坏数据集,其中包含未成功,成功合成样本和图像质量研究的相应潜在代码。基于此数据集,我们提出了一个通过使用线性SVM来查找良好潜在代码的框架。对最近在两个基准数据集进行培训的近期潜水员生成器的实验结果证明了我们提出的技术的有效性,在预测潜在矢量的$ {good} $/$ {bad} $ classion方面,预测$ 94 \%的准确性。好/坏数据集可在https://zenodo.org/record/5850224#.yegmwp7mkuk上公开获得。
The task of text-to-image generation has achieved remarkable progress due to the advances in the conditional generative adversarial networks (GANs). However, existing conditional text-to-image GANs approaches mostly concentrate on improving both image quality and semantic relevance but ignore the explainability of the model which plays a vital role in real-world applications. In this paper, we present a variety of techniques to take a deep look into the latent space and semantic space of the conditional text-to-image GANs model. We introduce pairwise linear interpolation of latent codes and `linguistic' linear interpolation to study what the model has learned within the latent space and `linguistic' embeddings. Subsequently, we extend linear interpolation to triangular interpolation conditioned on three corners to further analyze the model. After that, we build a Good/Bad data set containing unsuccessfully and successfully synthetic samples and corresponding latent codes for the image-quality research. Based on this data set, we propose a framework for finding good latent codes by utilizing a linear SVM. Experimental results on the recent DiverGAN generator trained on two benchmark data sets qualitatively prove the effectiveness of our presented techniques, with a better than 94\% accuracy in predicting ${Good}$/${Bad}$ classes for latent vectors. The Good/Bad data set is publicly available at https://zenodo.org/record/5850224#.YeGMwP7MKUk.