预处理是图像到图像翻译所需的全部

论文标题

预处理是图像到图像翻译所需的全部

Pretraining is All You Need for Image-to-Image Translation

论文作者

Wang, Tengfei, Zhang, Ting, Zhang, Bo, Ouyang, Hao, Chen, Dong, Chen, Qifeng, Wen, Fang

论文摘要

我们建议使用预告片来增强一般图像到图像的翻译。事先的图像到图像翻译方法通常需要专门的体系结构设计，并从头开始训练单个翻译模型，努力争取高质量的复杂场景，尤其是当配对训练数据不丰富时。在本文中，我们将每个图像到图像翻译问题视为一个下游任务，并引入了一个简单而通用的框架，该框架适应了经过验证的扩散模型以适应各种形式的图像到图像对图像翻译。我们还提出了对抗性训练，以增强扩散模型训练中的质地合成，并结合标准化的指导采样，以提高发电质量。我们在各种任务上进行了广泛的经验比较，以挑战基准，例如ADE20K，可可固定和二极管，显示了拟议的基于训练的基于预训练的图像到图像翻译（PITI）能够综合前所未有的现实主义和忠实性的图像。

We propose to use pretraining to boost general image-to-image translation. Prior image-to-image translation methods usually need dedicated architectural design and train individual translation models from scratch, struggling for high-quality generation of complex scenes, especially when paired training data are not abundant. In this paper, we regard each image-to-image translation problem as a downstream task and introduce a simple and generic framework that adapts a pretrained diffusion model to accommodate various kinds of image-to-image translation. We also propose adversarial training to enhance the texture synthesis in the diffusion model training, in conjunction with normalized guidance sampling to improve the generation quality. We present extensive empirical comparison across various tasks on challenging benchmarks such as ADE20K, COCO-Stuff, and DIODE, showing the proposed pretraining-based image-to-image translation (PITI) is capable of synthesizing images of unprecedented realism and faithfulness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题