MGD-GAN：通过多元歧视的文本到培训的生成

论文标题

MGD-GAN：通过多元歧视的文本到培训的生成

MGD-GAN: Text-to-Pedestrian generation through Multi-Grained Discrimination

论文作者

Zhang, Shengyu, Wang, Donghui, Zhao, Zhou, Tang, Siliang, Xie, Di, Wu, Fei

论文摘要

在本文中，我们研究了文本到培养家综合的问题，该问题在艺术，设计和视频监视中具有许多潜在的应用。由于行人自然而然地采用的复杂结构和异质外观，现有的文本到鸟/花综合方法仍然远没有解决这个细粒度的产生问题。为此，我们提出了多元化的歧视增强的生成对抗网络，该网络大写了基于人体的歧视者（HPD）和一个自我交叉的（SCA）全球歧视器，以捕获复杂的身体结构的连贯性。 HPD模块中采用了罚款粒度的单词级别的注意机制，以执行多样化的外观和生动的细节。此外，设计了两个名为“姿势得分”和“姿势差异”的行人产生指标分别评估生成质量和多样性。我们对标题注释的行人数据集进行了广泛的实验和消融研究，CUHK人描述数据集。对各种指标的实质性改进证明了MGD-GAN对文本到培训的综合情景的功效。

In this paper, we investigate the problem of text-to-pedestrian synthesis, which has many potential applications in art, design, and video surveillance. Existing methods for text-to-bird/flower synthesis are still far from solving this fine-grained image generation problem, due to the complex structure and heterogeneous appearance that the pedestrians naturally take on. To this end, we propose the Multi-Grained Discrimination enhanced Generative Adversarial Network, that capitalizes a human-part-based Discriminator (HPD) and a self-cross-attended (SCA) global Discriminator in order to capture the coherence of the complex body structure. A fined-grained word-level attention mechanism is employed in the HPD module to enforce diversified appearance and vivid details. In addition, two pedestrian generation metrics, named Pose Score and Pose Variance, are devised to evaluate the generation quality and diversity, respectively. We conduct extensive experiments and ablation studies on the caption-annotated pedestrian dataset, CUHK Person Description Dataset. The substantial improvement over the various metrics demonstrates the efficacy of MGD-GAN on the text-to-pedestrian synthesis scenario.

下载PDF全文

下载文献需遵守相关版权规定

论文标题