DT2I：从区域描述中密集的文本到图像生成

论文标题

DT2I：从区域描述中密集的文本到图像生成

DT2I: Dense Text-to-Image Generation from Region Descriptions

论文作者

Frolov, Stanislav, Bansal, Prateek, Hees, Jörn, Dengel, Andreas

论文摘要

尽管取得了惊人的进步，但生成复杂场景的现实图像仍然是一个具有挑战性的问题。最近，布局到图像合成方法通过在边界框和相应的类标签列表中调节发电机来引起了很大的兴趣。但是，先前的方法非常限制，因为该标签集已先验修复。同时，文本对图像合成方法已大大改善，并为有条件的图像生成提供了一种灵活的方式。在这项工作中，我们引入了密集的文本对图像（DT2I）综合，这是为更直观的图像生成铺平道路的新任务。此外，我们提出了DTC-GAN，这是一种从语义丰富的区域描述中生成图像的新方法，并且多模式区域具有匹配损失，以鼓励语义图像文本匹配。我们的结果证明了我们使用区域标题生成复杂场景的合理图像的方法的能力。

Despite astonishing progress, generating realistic images of complex scenes remains a challenging problem. Recently, layout-to-image synthesis approaches have attracted much interest by conditioning the generator on a list of bounding boxes and corresponding class labels. However, previous approaches are very restrictive because the set of labels is fixed a priori. Meanwhile, text-to-image synthesis methods have substantially improved and provide a flexible way for conditional image generation. In this work, we introduce dense text-to-image (DT2I) synthesis as a new task to pave the way toward more intuitive image generation. Furthermore, we propose DTC-GAN, a novel method to generate images from semantically rich region descriptions, and a multi-modal region feature matching loss to encourage semantic image-text matching. Our results demonstrate the capability of our approach to generate plausible images of complex scenes using region captions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题