论文标题
MN-DS:新闻文章层次分类的多列新闻数据集
MN-DS: A Multilabeled News Dataset for News Articles Hierarchical Classification
论文作者
论文摘要
本文介绍了10,917篇新闻文章的数据集,其分层新闻类别在2019年1月1日至2019年12月31日之间收集。我们手动根据层次分类法标记了这些文章,其中有17个第一级和109个二级类别。该数据集可用于训练机器学习模型,以自动按主题对新闻文章进行分类。该数据集可能有助于研究新闻结构,分类和根据发布新闻预测未来事件的研究人员。
This article presents a dataset of 10,917 news articles with hierarchical news categories collected between 1 January 2019 and 31 December 2019. We manually labeled the articles based on a hierarchical taxonomy with 17 first-level and 109 second-level categories. This dataset can be used to train machine learning models for automatically classifying news articles by topic. This dataset can be helpful for researchers working on news structuring, classification, and predicting future events based on released news.