M6时尚：高保真多模式图像生成和编辑

论文标题

M6时尚：高保真多模式图像生成和编辑

M6-Fashion: High-Fidelity Multi-modal Image Generation and Editing

论文作者

Li, Zhikang, Zhou, Huiling, Bai, Shuai, Li, Peike, Zhou, Chang, Yang, Hongxia

论文摘要

时装业在多模式图像生成和编辑中具有多种应用。它旨在以多模式条件信号作为指导创建所需的高保真图像。大多数现有方法通过引入额外的模型或忽略样式的先验知识来学习不同的条件指导控件，这很难处理多个信号组合并面临低保真问题。在本文中，我们将风格的先验知识和多模式控制的灵活性调整为一个统一的两阶段框架，即M6-Fasines，重点是实用的AI-Aed时装设计。它可以在空间和语义维度上取消样式代码，以确保第一阶段的高保真图像产生。 M6-Fashion利用自动校正来提高推理速度，提高整体一致性并支持各种信号控制。大规模服装数据集M2C时尚的广泛实验表明，在各种图像生成和编辑任务上都表现出色。 M6-Fashion Model是时装行业的高潜力AI设计师。

The fashion industry has diverse applications in multi-modal image generation and editing. It aims to create a desired high-fidelity image with the multi-modal conditional signal as guidance. Most existing methods learn different condition guidance controls by introducing extra models or ignoring the style prior knowledge, which is difficult to handle multiple signal combinations and faces a low-fidelity problem. In this paper, we adapt both style prior knowledge and flexibility of multi-modal control into one unified two-stage framework, M6-Fashion, focusing on the practical AI-aided Fashion design. It decouples style codes in both spatial and semantic dimensions to guarantee high-fidelity image generation in the first stage. M6-Fashion utilizes self-correction for the non-autoregressive generation to improve inference speed, enhance holistic consistency, and support various signal controls. Extensive experiments on a large-scale clothing dataset M2C-Fashion demonstrate superior performances on various image generation and editing tasks. M6-Fashion model serves as a highly potential AI designer for the fashion industry.

下载PDF全文

下载文献需遵守相关版权规定

论文标题