图像编辑的Delving stylegan倒置：基础潜在空间观点

论文标题

图像编辑的Delving stylegan倒置：基础潜在空间观点

Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space Viewpoint

论文作者

Liu, Hongyu, Song, Yibing, Chen, Qifeng

论文摘要

GAN反转和通过StyleGAN编辑将输入图像映射到嵌入式空间（$ \ Mathcal {W} $，$ \ Mathcal {W^+} $，以及$ \ Mathcal {f} $）中，同时维持图像保真度和有意义的操纵。从潜在空间$ \ mathcal {w} $到扩展潜在空间$ \ Mathcal {w^+} $，在stylegan中具有Space $ \ Mathcal {f} $的特征，GAN反转的编辑性降低，而其重建质量则提高。最近的GAN反转方法通常会探索$ \ Mathcal {W^+} $和$ \ Mathcal {F} $，而不是$ \ Mathcal {W} $，以提高重建保真度，同时保持可编辑性。 As $\mathcal{W^+}$ and $\mathcal{F}$ are derived from $\mathcal{W}$ that is essentially the foundation latent space of StyleGAN, these GAN inversion methods focusing on $\mathcal{W^+}$ and $\mathcal{F}$ spaces could be improved by stepping back to $ \ Mathcal {W} $。在这项工作中，我们建议首先在基础潜在空间$ \ mathcal {w} $中获得精确的潜在代码。我们引入了对比度学习，以使$ \ Mathcal {W} $和图像空间与精确的潜在代码发现相符。％获得过程是通过使用对比度学习来对齐$ \ Mathcal {W} $和图像空间。然后，我们利用跨注意编码器将$ \ Mathcal {W} $中获得的潜在代码转换为$ \ Mathcal {W^+} $和$ \ Mathcal {F} $，因此。我们的实验表明，我们对基础潜在空间$ \ mathcal {w} $的探索提高了潜在代码在$ \ Mathcal {w^+} $中的表示能力，并提高了$ \ Mathcal {f} $中的功能，从而产生了最先进的预定限制性的富裕性和可构造能力，并以标准的基础标记为标准的基础标记。项目页面：https：//kumapowerliu.github.io/clcae。

GAN inversion and editing via StyleGAN maps an input image into the embedding spaces ($\mathcal{W}$, $\mathcal{W^+}$, and $\mathcal{F}$) to simultaneously maintain image fidelity and meaningful manipulation. From latent space $\mathcal{W}$ to extended latent space $\mathcal{W^+}$ to feature space $\mathcal{F}$ in StyleGAN, the editability of GAN inversion decreases while its reconstruction quality increases. Recent GAN inversion methods typically explore $\mathcal{W^+}$ and $\mathcal{F}$ rather than $\mathcal{W}$ to improve reconstruction fidelity while maintaining editability. As $\mathcal{W^+}$ and $\mathcal{F}$ are derived from $\mathcal{W}$ that is essentially the foundation latent space of StyleGAN, these GAN inversion methods focusing on $\mathcal{W^+}$ and $\mathcal{F}$ spaces could be improved by stepping back to $\mathcal{W}$. In this work, we propose to first obtain the precise latent code in foundation latent space $\mathcal{W}$. We introduce contrastive learning to align $\mathcal{W}$ and the image space for precise latent code discovery. %The obtaining process is by using contrastive learning to align $\mathcal{W}$ and the image space. Then, we leverage a cross-attention encoder to transform the obtained latent code in $\mathcal{W}$ into $\mathcal{W^+}$ and $\mathcal{F}$, accordingly. Our experiments show that our exploration of the foundation latent space $\mathcal{W}$ improves the representation ability of latent codes in $\mathcal{W^+}$ and features in $\mathcal{F}$, which yields state-of-the-art reconstruction fidelity and editability results on the standard benchmarks. Project page: https://kumapowerliu.github.io/CLCAE.

下载PDF全文

下载文献需遵守相关版权规定

论文标题