Stylemask：删除用于神经面部重演的stylegan2的样式空间

论文标题

Stylemask：删除用于神经面部重演的stylegan2的样式空间

StyleMask: Disentangling the Style Space of StyleGAN2 for Neural Face Reenactment

论文作者

Bounareli, Stella, Tzelepis, Christos, Argyriou, Vasileios, Patras, Ioannis, Tzimiropoulos, Georgios

论文摘要

在本文中，我们解决了神经面部重演的问题，鉴于一对源和目标面部图像，我们需要通过在源形状特征（例如，面部形状，样式等）构成质疑的情况下，将目标的姿势（定义为头部姿势及其面部表情定义）到源形象中，并保留源的身份特征（例如，面部形状，样式等）。为此，我们解决了最先进的作品的一些局限性，即a）它们依赖于配对的训练数据（即源和目标面具有相同的身份），b）他们在推断过程中依赖于标记的数据，c）他们在大型头部中不保留身份不变。更具体地说，我们提出了一个框架，该框架使用未配对的随机生成的面部图像学会通过合并最近引入的样式空间$ \ Mathcal $ \ Mathcal $ \ Mathcal {s} $ of stylegan2，这是一个潜在的表示空间，它表现出了显着的散布属性。通过利用这一点，我们学会使用3D模型的监督成功地混合了一对源和目标样式代码。随后用于重新制定的最终潜在代码由仅与源相对应与源身份相对应的单位相对应的潜在单位组成，从而与最近的最新方法相比，重新制定性能的显着改善。与艺术的状态相比，我们定量和定性地表明，即使在极端姿势变化下，提出的方法也会产生更高的质量结果。最后，我们通过首先将它们嵌入预验证的发电机的潜在空间来报告实际图像。我们在以下网址公开提供代码和预估计的模型。

In this paper we address the problem of neural face reenactment, where, given a pair of a source and a target facial image, we need to transfer the target's pose (defined as the head pose and its facial expressions) to the source image, by preserving at the same time the source's identity characteristics (e.g., facial shape, hair style, etc), even in the challenging case where the source and the target faces belong to different identities. In doing so, we address some of the limitations of the state-of-the-art works, namely, a) that they depend on paired training data (i.e., source and target faces have the same identity), b) that they rely on labeled data during inference, and c) that they do not preserve identity in large head pose changes. More specifically, we propose a framework that, using unpaired randomly generated facial images, learns to disentangle the identity characteristics of the face from its pose by incorporating the recently introduced style space $\mathcal{S}$ of StyleGAN2, a latent representation space that exhibits remarkable disentanglement properties. By capitalizing on this, we learn to successfully mix a pair of source and target style codes using supervision from a 3D model. The resulting latent code, that is subsequently used for reenactment, consists of latent units corresponding to the facial pose of the target only and of units corresponding to the identity of the source only, leading to notable improvement in the reenactment performance compared to recent state-of-the-art methods. In comparison to state of the art, we quantitatively and qualitatively show that the proposed method produces higher quality results even on extreme pose variations. Finally, we report results on real images by first embedding them on the latent space of the pretrained generator. We make the code and pretrained models publicly available at: https://github.com/StelaBou/StyleMask

下载PDF全文

下载文献需遵守相关版权规定

论文标题