论文标题
在交流游戏中为人格图像字幕字幕的结构和功能分解
Structural and Functional Decomposition for Personality Image Captioning in a Communication Game
论文作者
论文摘要
个性图像字幕(PIC)旨在描述具有自然语言标题的形象,给定个性特征。在这项工作中,我们根据演讲者和听众之间的通信游戏介绍了图片的新颖配方。演讲者试图生成自然语言标题,而听众则鼓励生成的字幕包含有关输入图像和人格特征的歧视性信息。这样,我们希望可以改进生成的字幕以自然表示图像并表达特征。此外,我们建议调整语言模型GPT2以执行图片的字幕生成。这使演讲者和听众能够从GPT2的语言编码能力中受益。我们的实验表明,所提出的模型实现了PIC的最新性能。
Personality image captioning (PIC) aims to describe an image with a natural language caption given a personality trait. In this work, we introduce a novel formulation for PIC based on a communication game between a speaker and a listener. The speaker attempts to generate natural language captions while the listener encourages the generated captions to contain discriminative information about the input images and personality traits. In this way, we expect that the generated captions can be improved to naturally represent the images and express the traits. In addition, we propose to adapt the language model GPT2 to perform caption generation for PIC. This enables the speaker and listener to benefit from the language encoding capacity of GPT2. Our experiments show that the proposed model achieves the state-of-the-art performance for PIC.