端到端的视觉编辑与一般训练的艺术家

论文标题

端到端的视觉编辑与一般训练的艺术家

End-to-End Visual Editing with a Generatively Pre-Trained Artist

论文作者

Brown, Andrew, Fu, Cheng-Yang, Parkhi, Omkar, Berg, Tamara L., Vedaldi, Andrea

论文摘要

我们考虑目标图像编辑问题：将源图像中的区域与指定所需更改的驱动程序图像混合。与先前的工作不同，我们通过学习端到端编辑的条件概率分布来解决此问题。培训这样的模型需要应对基本技术挑战：缺乏培训的示例编辑。为此，我们提出了一种自我监督的方法，该方法通过增加目标域中的现成图像来模拟编辑。好处是显着的：作为最先进的自动回归变压器实施，我们的方法是简单的，以前基于类似gan的先验的方法避开了困难，获得了明显更好的编辑，并且有效。此外，我们表明可以通过对增强过程的直观控制来学习不同的混合效果，而模型体系结构不需要其他更改。我们在包括人类研究在内的广泛定量和定性实验中证明了这种方法在几个数据集中的优越性，其表现明显优于先前的工作。

We consider the targeted image editing problem: blending a region in a source image with a driver image that specifies the desired change. Differently from prior works, we solve this problem by learning a conditional probability distribution of the edits, end-to-end. Training such a model requires addressing a fundamental technical challenge: the lack of example edits for training. To this end, we propose a self-supervised approach that simulates edits by augmenting off-the-shelf images in a target domain. The benefits are remarkable: implemented as a state-of-the-art auto-regressive transformer, our approach is simple, sidesteps difficulties with previous methods based on GAN-like priors, obtains significantly better edits, and is efficient. Furthermore, we show that different blending effects can be learned by an intuitive control of the augmentation process, with no other changes required to the model architecture. We demonstrate the superiority of this approach across several datasets in extensive quantitative and qualitative experiments, including human studies, significantly outperforming prior work.

下载PDF全文

下载文献需遵守相关版权规定

论文标题