每张图片都讲述一个故事：图像可控的风格故事产生

论文标题

每张图片都讲述一个故事：图像可控的风格故事产生

Every picture tells a story: Image-grounded controllable stylistic story generation

论文作者

Lovenia, Holy, Wilie, Bryan, Barraud, Romain, Cahyawijaya, Samuel, Chung, Willy, Fung, Pascale

论文摘要

从图像中产生一个短篇小说是艰巨的。与图像字幕不同，来自图像的故事产生构成了多个挑战：保持故事连贯性，适当评估故事的质量，将生成的故事转向某种风格，并解决图像故事对的参考数据集的稀缺性，从而在培训期间限制了监督。在这项工作中，我们通过以下方式介绍了插件故事讲述者（PPST），并通过以下方式改善图像到故事的生成：1）通过合并大型预训练的模型（即剪辑和GPT-2）来减轻数据稀缺问题，即促进流利的图像到文本的生成，以最少的监督以及2）通过更具风格的生成来构成风格的型号，以促进风格的型号，以促进风格的型号。我们通过使用自动和人类评估的三个方面，即故事连贯性，图像故事相关性和样式健身，并将我们的生成的故事与以前的作品进行了比较，并将我们的生成的故事与以前的作品进行比较，并将我们的生成的故事与以前的作品进行比较，并将我们生成的故事进行比较。结果表明，PPST提高了故事的连贯性，并且具有更好的图像故事相关性，但尚未充分风格。

Generating a short story out of an image is arduous. Unlike image captioning, story generation from an image poses multiple challenges: preserving the story coherence, appropriately assessing the quality of the story, steering the generated story into a certain style, and addressing the scarcity of image-story pair reference datasets limiting supervision during training. In this work, we introduce Plug-and-Play Story Teller (PPST) and improve image-to-story generation by: 1) alleviating the data scarcity problem by incorporating large pre-trained models, namely CLIP and GPT-2, to facilitate a fluent image-to-text generation with minimal supervision, and 2) enabling a more style-relevant generation by incorporating stylistic adapters to control the story generation. We conduct image-to-story generation experiments with non-styled, romance-styled, and action-styled PPST approaches and compare our generated stories with those of previous work over three aspects, i.e., story coherence, image-story relevance, and style fitness, using both automatic and human evaluation. The results show that PPST improves story coherence and has better image-story relevance, but has yet to be adequately stylistic.

下载PDF全文

下载文献需遵守相关版权规定

论文标题