论文标题

每张图片都讲述一个故事:图像可控的风格故事产生

Every picture tells a story: Image-grounded controllable stylistic story generation

论文作者

Lovenia, Holy, Wilie, Bryan, Barraud, Romain, Cahyawijaya, Samuel, Chung, Willy, Fung, Pascale

论文摘要

从图像中产生一个短篇小说是艰巨的。与图像字幕不同,来自图像的故事产生构成了多个挑战:保持故事连贯性,适当评估故事的质量,将生成的故事转向某种风格,并解决图像故事对的参考数据集的稀缺性,从而在培训期间限制了监督。在这项工作中,我们通过以下方式介绍了插件故事讲述者(PPST),并通过以下方式改善图像到故事的生成:1)通过合并大型预训练的模型(即剪辑和GPT-2)来减轻数据稀缺问题,即促进流利的图像到文本的生成,以最少的监督以及2)通过更具风格的生成来构成风格的型号,以促进风格的型号,以促进风格的型号。我们通过使用自动和人类评估的三个方面,即故事连贯性,图像故事相关性和样式健身,并将我们的生成的故事与以前的作品进行了比较,并将我们的生成的故事与以前的作品进行比较,并将我们的生成的故事与以前的作品进行比较,并将我们生成的故事进行比较。结果表明,PPST提高了故事的连贯性,并且具有更好的图像故事相关性,但尚未充分风格。

Generating a short story out of an image is arduous. Unlike image captioning, story generation from an image poses multiple challenges: preserving the story coherence, appropriately assessing the quality of the story, steering the generated story into a certain style, and addressing the scarcity of image-story pair reference datasets limiting supervision during training. In this work, we introduce Plug-and-Play Story Teller (PPST) and improve image-to-story generation by: 1) alleviating the data scarcity problem by incorporating large pre-trained models, namely CLIP and GPT-2, to facilitate a fluent image-to-text generation with minimal supervision, and 2) enabling a more style-relevant generation by incorporating stylistic adapters to control the story generation. We conduct image-to-story generation experiments with non-styled, romance-styled, and action-styled PPST approaches and compare our generated stories with those of previous work over three aspects, i.e., story coherence, image-story relevance, and style fitness, using both automatic and human evaluation. The results show that PPST improves story coherence and has better image-story relevance, but has yet to be adequately stylistic.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源