论文标题

用于建模和解释剧本的层次编码器

Hierarchical Encoders for Modeling and Interpreting Screenplays

论文作者

Bhat, Gayatri, Saluja, Avneesh, Dye, Melody, Florjanczyk, Jan

论文摘要

虽然对长形文档的自然语言理解仍然是一个开放的挑战,但此类文档通常包含结构信息,这些信息可以为编码它们的模型设计提供信息。电影脚本是如此丰富的文本的一个示例 - 将脚本细分为场景,这些脚本进一步分解为对话和描述性组件。在这项工作中,我们提出了一种用于编码此结构的神经体系结构,该结构在一对多标签标签分类数据集上稳健地执行,而无需手工制作的功能。我们通过将无监督的“可解释性”模块扩展到编码器,从而添加了一层见解,从而可以提取和可视化叙事轨迹。尽管这项工作专门针对剧本,但我们讨论了如何将基础方法推广到一系列结构化文档。

While natural language understanding of long-form documents is still an open challenge, such documents often contain structural information that can inform the design of models for encoding them. Movie scripts are an example of such richly structured text - scripts are segmented into scenes, which are further decomposed into dialogue and descriptive components. In this work, we propose a neural architecture for encoding this structure, which performs robustly on a pair of multi-label tag classification datasets, without the need for handcrafted features. We add a layer of insight by augmenting an unsupervised "interpretability" module to the encoder, allowing for the extraction and visualization of narrative trajectories. Though this work specifically tackles screenplays, we discuss how the underlying approach can be generalized to a range of structured documents.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源