大量的多文件汇总产品评论的监督较弱

论文标题

大量的多文件汇总产品评论的监督较弱

Massive Multi-Document Summarization of Product Reviews with Weak Supervision

论文作者

Shapira, Ori, Levy, Ran

论文摘要

产品评论摘要是一种多文件摘要（MDS）任务，其中摘要的文档集通常远大于传统MDS（多达数万个评论）。我们强调了这种差异，并将“大量多文件摘要”（MMDS）归因于表示涉及数百个文档或更多文档的MDS任务。产品评论的先前工作摘要考虑了评论的少量样本，这主要是由于难以处理大量文档集。我们表明，总结小样本可能会导致重要信息的丢失并提供误导性的评估结果。我们提出了一个模式，用于汇总标准摘要算法之上的大量评论。由于编写了高级神经网络模型所需的大量参考摘要是不切实际的，因此我们的解决方案依赖于弱监督。最后，我们提出了一个基于多个众包参考摘要的评估方案，并旨在捕获大规模的审查集合。我们表明，我们的模式的初始实施可显着改善胭脂分数的几个基线，并在手动语言质量评估中表现出强烈的连贯性。

Product reviews summarization is a type of Multi-Document Summarization (MDS) task in which the summarized document sets are often far larger than in traditional MDS (up to tens of thousands of reviews). We highlight this difference and coin the term "Massive Multi-Document Summarization" (MMDS) to denote an MDS task that involves hundreds of documents or more. Prior work on product reviews summarization considered small samples of the reviews, mainly due to the difficulty of handling massive document sets. We show that summarizing small samples can result in loss of important information and provide misleading evaluation results. We propose a schema for summarizing a massive set of reviews on top of a standard summarization algorithm. Since writing large volumes of reference summaries needed for advanced neural network models is impractical, our solution relies on weak supervision. Finally, we propose an evaluation scheme that is based on multiple crowdsourced reference summaries and aims to capture the massive review collection. We show that an initial implementation of our schema significantly improves over several baselines in ROUGE scores, and exhibits strong coherence in a manual linguistic quality assessment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题