自动汇总开放域播客剧集

论文标题

自动汇总开放域播客剧集

Automatic Summarization of Open-Domain Podcast Episodes

论文作者

Song, Kaiqiang, Li, Chen, Wang, Xiaoyang, Yu, Dong, Liu, Fei

论文摘要

我们介绍了我们的抽象性摘要的实施详细信息，这些详细信息在TREC 2020的播客摘要任务上获得了竞争成果。简洁的文本摘要，捕获重要信息对于用户决定是否聆听播客至关重要。先前的工作主要集中于学习上下文化表示。取而代之的是，我们研究了神经抽象摘要的几个较少研究的方面，包括（i）从成绩单中选择重要部分以作为摘要的输入的重要性；（ii）在培训实例的数量和质量之间达到平衡；（iii）适当的摘要长度和开始/终点。我们强调了系统背后的设计考虑因素，并为神经抽象系统的优势和劣势提供了关键的见解。我们的结果表明，确定成绩单中的重要细分将用作抽象性摘要的输入对于总结长文档是有利的。我们的最佳系统以NIST评估者评判的1.559质量评级为1.559，与创建者描述相比，绝对增加了0.268（+21％）。

We present implementation details of our abstractive summarizers that achieve competitive results on the Podcast Summarization task of TREC 2020. A concise textual summary that captures important information is crucial for users to decide whether to listen to the podcast. Prior work focuses primarily on learning contextualized representations. Instead, we investigate several less-studied aspects of neural abstractive summarization, including (i) the importance of selecting important segments from transcripts to serve as input to the summarizer; (ii) striking a balance between the amount and quality of training instances; (iii) the appropriate summary length and start/end points. We highlight the design considerations behind our system and offer key insights into the strengths and weaknesses of neural abstractive systems. Our results suggest that identifying important segments from transcripts to use as input to an abstractive summarizer is advantageous for summarizing long documents. Our best system achieves a quality rating of 1.559 judged by NIST evaluators---an absolute increase of 0.268 (+21%) over the creator descriptions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题