论文标题
介绍威尔士文本摘要数据集和基线系统
Introducing the Welsh Text Summarisation Dataset and Baseline Systems
论文作者
论文摘要
威尔士是威尔士的一种官方语言,估计有884,300人(占威尔士人口的29.2%)说。尽管自上次(2011年)以来,尽管这种地位并估计说话者数量增加,但威尔士仍然是威尔士政府和相关利益相关者的振兴和促进的少数族裔语言。作为提高威尔士数字技术可用性的努力的一部分,本文介绍了第一个威尔士摘要数据集,我们为研究目的免费提供了该数据集,以帮助推进威尔士文本摘要的工作。该数据集是由威尔士演讲者手动总结威尔士Wikipedia文章创建的。此外,本文讨论了威尔士的不同摘要系统的实施和评估。摘要系统和结果将作为在其他少数族裔语言环境中开发总结的基准。
Welsh is an official language in Wales and is spoken by an estimated 884,300 people (29.2% of the population of Wales). Despite this status and estimated increase in speaker numbers since the last (2011) census, Welsh remains a minority language undergoing revitalization and promotion by Welsh Government and relevant stakeholders. As part of the effort to increase the availability of Welsh digital technology, this paper introduces the first Welsh summarisation dataset, which we provide freely for research purposes to help advance the work on Welsh text summarization. The dataset was created by Welsh speakers by manually summarising Welsh Wikipedia articles. In addition, the paper discusses the implementation and evaluation of different summarisation systems for Welsh. The summarization systems and results will serve as benchmarks for the development of summarises in other minority language contexts.