控制具有变化连续因素的生成模型

论文标题

控制具有变化连续因素的生成模型

Controlling generative models with continuous factors of variations

论文作者

Plumerault, Antoine, Borgne, Hervé Le, Hudelot, Céline

论文摘要

最近的深层生成模型能够提供照片真实的图像以及可用于解决计算机视觉和自然语言处理的各种任务的视觉或文本内容嵌入。然而，它们的实用性通常受到对生成过程缺乏控制或对学识渊博表示的不良理解的限制。为了克服这些主要问题，最近的工作表明了研究生成模型潜在空间的语义的兴趣。在本文中，我们提出通过引入一种新方法来在任何生成模型的潜在空间中找到有意义的方向来提高生成模型的潜在空间的解释性，我们可以移动以控制生成的图像的精确特定属性，例如图像中对象的位置或规模。我们的方法不需要人类注释，并且特别适合搜索编码生成图像的简单转换的方向，例如翻译，变焦或颜色变化。我们在定性和定量上证明了我们的方法的有效性，包括gan和变异自动编码器。

Recent deep generative models are able to provide photo-realistic images as well as visual or textual content embeddings useful to address various tasks of computer vision and natural language processing. Their usefulness is nevertheless often limited by the lack of control over the generative process or the poor understanding of the learned representation. To overcome these major issues, very recent work has shown the interest of studying the semantics of the latent space of generative models. In this paper, we propose to advance on the interpretability of the latent space of generative models by introducing a new method to find meaningful directions in the latent space of any generative model along which we can move to control precisely specific properties of the generated image like the position or scale of the object in the image. Our method does not require human annotations and is particularly well suited for the search of directions encoding simple transformations of the generated image, such as translation, zoom or color variations. We demonstrate the effectiveness of our method qualitatively and quantitatively, both for GANs and variational auto-encoders.

下载PDF全文

下载文献需遵守相关版权规定

论文标题