论文标题
文本分类的调查:从浅层学习到深度学习
A Survey on Text Classification: From Shallow to Deep Learning
论文作者
论文摘要
文本分类是自然语言处理中最基本和最重要的任务。由于深度学习的前所未有的成功,在过去的十年中,该领域的研究激增。文献中提出了许多方法,数据集和评估指标,这增加了对全面和更新的调查的需求。本文通过回顾1961年至2021年的最新方法来填补空白,重点是从传统模型到深度学习的模型。我们根据所涉及的文本以及用于特征提取和分类的模型为文本分类创建分类法。然后,我们详细讨论这些类别中的每个类别,涉及支持预测测试的技术发展和基准数据集。本调查还提供了不同技术之间的全面比较,并确定各种评估指标的利弊。最后,我们通过总结关键含义,未来的研究方向以及研究领域面临的挑战来得出结论。
Text classification is the most fundamental and essential task in natural language processing. The last decade has seen a surge of research in this area due to the unprecedented success of deep learning. Numerous methods, datasets, and evaluation metrics have been proposed in the literature, raising the need for a comprehensive and updated survey. This paper fills the gap by reviewing the state-of-the-art approaches from 1961 to 2021, focusing on models from traditional models to deep learning. We create a taxonomy for text classification according to the text involved and the models used for feature extraction and classification. We then discuss each of these categories in detail, dealing with both the technical developments and benchmark datasets that support tests of predictions. A comprehensive comparison between different techniques, as well as identifying the pros and cons of various evaluation metrics are also provided in this survey. Finally, we conclude by summarizing key implications, future research directions, and the challenges facing the research area.