用于自动汇总俄罗斯新闻的数据集

论文标题

用于自动汇总俄罗斯新闻的数据集

Dataset for Automatic Summarization of Russian News

论文作者

Gusev, Ilya

论文摘要

自动文本摘要已在各种领域和语言中进行了研究。但是，这不适合俄罗斯语言。为了克服这个问题，我们介绍了Gazeta，这是第一个汇总俄罗斯新闻的数据集。我们描述了该数据集的属性，并基准了几种提取和抽象模型。我们证明数据集是俄罗斯文本摘要方法的有效任务。此外，我们证明了预审预定的MBART模型对俄罗斯文本摘要有用。

Automatic text summarization has been studied in a variety of domains and languages. However, this does not hold for the Russian language. To overcome this issue, we present Gazeta, the first dataset for summarization of Russian news. We describe the properties of this dataset and benchmark several extractive and abstractive models. We demonstrate that the dataset is a valid task for methods of text summarization for Russian. Additionally, we prove the pretrained mBART model to be useful for Russian text summarization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题