主题改编和编码几个镜头的视觉讲故事的原型

论文标题

主题改编和编码几个镜头的视觉讲故事的原型

Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling

论文作者

Li, Jiacheng, Tang, Siliang, Li, Juncheng, Xiao, Jun, Wu, Fei, Pu, Shiliang, Zhuang, Yueting

论文摘要

视觉讲故事〜（Vist）是根据给定的照片流讲述某个主题的叙事故事的任务。现有的研究重点是设计复杂模型，这些模型依赖大量的人类注销数据。但是，VIST的注释非常昂贵，由于长尾主题分布，培训数据集无法涵盖许多主题。在本文中，我们专注于通过考虑少量设置来增强VIST模型的概括能力。受到人类讲述故事的方式的启发，我们提出了一个主题自适应讲故事的人，以模拟主题间概括的能力。在实践中，我们将基于梯度的元学习算法应用于多模式SEQ2SEQ模型，以赋予模型快速适应主题的能力。此外，我们进一步提出了一个原型编码结构，以模拟主流推导的能力。具体来说，我们编码并恢复了一些培训故事文本，以作为指导推理时代的参考。实验结果表明，主题适应和原型编码结构相互带来了BLEU和流星指标上的几种模型。进一步的案例研究表明，几乎没有射击后产生的故事更相对和表现力。

Visual Storytelling~(VIST) is a task to tell a narrative story about a certain topic according to the given photo stream. The existing studies focus on designing complex models, which rely on a huge amount of human-annotated data. However, the annotation of VIST is extremely costly and many topics cannot be covered in the training dataset due to the long-tail topic distribution. In this paper, we focus on enhancing the generalization ability of the VIST model by considering the few-shot setting. Inspired by the way humans tell a story, we propose a topic adaptive storyteller to model the ability of inter-topic generalization. In practice, we apply the gradient-based meta-learning algorithm on multi-modal seq2seq models to endow the model the ability to adapt quickly from topic to topic. Besides, We further propose a prototype encoding structure to model the ability of intra-topic derivation. Specifically, we encode and restore the few training story text to serve as a reference to guide the generation at inference time. Experimental results show that topic adaptation and prototype encoding structure mutually bring benefit to the few-shot model on BLEU and METEOR metric. The further case study shows that the stories generated after few-shot adaptation are more relative and expressive.

下载PDF全文

下载文献需遵守相关版权规定

论文标题