论文标题
图像编辑的Delving stylegan倒置:基础潜在空间观点
Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space Viewpoint
论文作者
论文摘要
GAN反转和通过StyleGAN编辑将输入图像映射到嵌入式空间($ \ Mathcal {W} $,$ \ Mathcal {W^+} $,以及$ \ Mathcal {f} $)中,同时维持图像保真度和有意义的操纵。从潜在空间$ \ mathcal {w} $到扩展潜在空间$ \ Mathcal {w^+} $,在stylegan中具有Space $ \ Mathcal {f} $的特征,GAN反转的编辑性降低,而其重建质量则提高。最近的GAN反转方法通常会探索$ \ Mathcal {W^+} $和$ \ Mathcal {F} $,而不是$ \ Mathcal {W} $,以提高重建保真度,同时保持可编辑性。 As $\mathcal{W^+}$ and $\mathcal{F}$ are derived from $\mathcal{W}$ that is essentially the foundation latent space of StyleGAN, these GAN inversion methods focusing on $\mathcal{W^+}$ and $\mathcal{F}$ spaces could be improved by stepping back to $ \ Mathcal {W} $。在这项工作中,我们建议首先在基础潜在空间$ \ mathcal {w} $中获得精确的潜在代码。我们引入了对比度学习,以使$ \ Mathcal {W} $和图像空间与精确的潜在代码发现相符。 %获得过程是通过使用对比度学习来对齐$ \ Mathcal {W} $和图像空间。然后,我们利用跨注意编码器将$ \ Mathcal {W} $中获得的潜在代码转换为$ \ Mathcal {W^+} $和$ \ Mathcal {F} $,因此。我们的实验表明,我们对基础潜在空间$ \ mathcal {w} $的探索提高了潜在代码在$ \ Mathcal {w^+} $中的表示能力,并提高了$ \ Mathcal {f} $中的功能,从而产生了最先进的预定限制性的富裕性和可构造能力,并以标准的基础标记为标准的基础标记。项目页面:https://kumapowerliu.github.io/clcae。
GAN inversion and editing via StyleGAN maps an input image into the embedding spaces ($\mathcal{W}$, $\mathcal{W^+}$, and $\mathcal{F}$) to simultaneously maintain image fidelity and meaningful manipulation. From latent space $\mathcal{W}$ to extended latent space $\mathcal{W^+}$ to feature space $\mathcal{F}$ in StyleGAN, the editability of GAN inversion decreases while its reconstruction quality increases. Recent GAN inversion methods typically explore $\mathcal{W^+}$ and $\mathcal{F}$ rather than $\mathcal{W}$ to improve reconstruction fidelity while maintaining editability. As $\mathcal{W^+}$ and $\mathcal{F}$ are derived from $\mathcal{W}$ that is essentially the foundation latent space of StyleGAN, these GAN inversion methods focusing on $\mathcal{W^+}$ and $\mathcal{F}$ spaces could be improved by stepping back to $\mathcal{W}$. In this work, we propose to first obtain the precise latent code in foundation latent space $\mathcal{W}$. We introduce contrastive learning to align $\mathcal{W}$ and the image space for precise latent code discovery. %The obtaining process is by using contrastive learning to align $\mathcal{W}$ and the image space. Then, we leverage a cross-attention encoder to transform the obtained latent code in $\mathcal{W}$ into $\mathcal{W^+}$ and $\mathcal{F}$, accordingly. Our experiments show that our exploration of the foundation latent space $\mathcal{W}$ improves the representation ability of latent codes in $\mathcal{W^+}$ and features in $\mathcal{F}$, which yields state-of-the-art reconstruction fidelity and editability results on the standard benchmarks. Project page: https://kumapowerliu.github.io/CLCAE.