otextsum：最佳传输的提取文本摘要

论文标题

otextsum：最佳传输的提取文本摘要

OTExtSum: Extractive Text Summarisation with Optimal Transport

论文作者

Tang, Peggy, Hu, Kun, Yan, Rui, Zhang, Lei, Gao, Junbin, Wang, Zhiyong

论文摘要

提取文本摘要旨在从文档中选择显着句子，以形成简短但有益的摘要。尽管基于学习的方法取得了令人鼓舞的结果，但它们有几个局限性，例如依赖昂贵的培训和缺乏解释性。因此，在本文中，我们首次将文本摘要作为最佳传输（OT）问题，即最佳运输提取摘要（OTEXTSUM）提出了一种新颖的基于根基的方法。最佳句子提取被概念化为获得最佳摘要，该摘要最大程度地减少了给定文档有关其语义分布的运输成本。这样的成本是由Wasserstein距离定义的，用于衡量摘要对原始文档的语义覆盖。对四个具有挑战性且广泛使用的数据集进行了全面的实验 - 多链，PubMed，Billsum和CNN/DM表明，我们所提出的方法优于最先进的基于非学习方法的方法，而在盘旋指标上则优于最新的基于学习的方法。

Extractive text summarisation aims to select salient sentences from a document to form a short yet informative summary. While learning-based methods have achieved promising results, they have several limitations, such as dependence on expensive training and lack of interpretability. Therefore, in this paper, we propose a novel non-learning-based method by for the first time formulating text summarisation as an Optimal Transport (OT) problem, namely Optimal Transport Extractive Summariser (OTExtSum). Optimal sentence extraction is conceptualised as obtaining an optimal summary that minimises the transportation cost to a given document regarding their semantic distributions. Such a cost is defined by the Wasserstein distance and used to measure the summary's semantic coverage of the original document. Comprehensive experiments on four challenging and widely used datasets - MultiNews, PubMed, BillSum, and CNN/DM demonstrate that our proposed method outperforms the state-of-the-art non-learning-based methods and several recent learning-based methods in terms of the ROUGE metric.

下载PDF全文

下载文献需遵守相关版权规定

论文标题