无监督的意见摘要与内容计划

论文标题

无监督的意见摘要与内容计划

Unsupervised Opinion Summarization with Content Planning

论文作者

Amplayo, Reinald Kim, Angelidis, Stefanos, Lapata, Mirella

论文摘要

深度学习技术对抽象性摘要的最新成功取决于大规模数据集的可用性。当总结评论（例如，对于产品或电影）时，此类培训数据既不可用，也无法轻松来源，激发了依靠合成数据集进行监督培训的方法的开发。我们表明，将内容计划明确地纳入摘要模型不仅会产生更高质量的输出，还可以创建合成数据集，这些数据集更自然，类似于现实世界文档 - 苏联对。我们的内容计划采用了我们从数据中诱发的方面和情感分布的形式，而无需访问昂贵的注释。合成数据集是通过从我们的内容策划者参数参数的Dirichlet分布中对伪评论进行采样而创建的，而我们的模型基于输入评论和诱导的内容计划生成摘要。三个领域的实验结果表明，我们的方法在产生捕获意见共识的信息，连贯和流利的摘要方面优于竞争模型。

The recent success of deep learning techniques for abstractive summarization is predicated on the availability of large-scale datasets. When summarizing reviews (e.g., for products or movies), such training data is neither available nor can be easily sourced, motivating the development of methods which rely on synthetic datasets for supervised training. We show that explicitly incorporating content planning in a summarization model not only yields output of higher quality, but also allows the creation of synthetic datasets which are more natural, resembling real world document-summary pairs. Our content plans take the form of aspect and sentiment distributions which we induce from data without access to expensive annotations. Synthetic datasets are created by sampling pseudo-reviews from a Dirichlet distribution parametrized by our content planner, while our model generates summaries based on input reviews and induced content plans. Experimental results on three domains show that our approach outperforms competitive models in generating informative, coherent, and fluent summaries that capture opinion consensus.

下载PDF全文

下载文献需遵守相关版权规定

论文标题