论文标题
大规模评估键形提取模型
Large-Scale Evaluation of Keyphrase Extraction Models
论文作者
论文摘要
通常在不同的(不直接可比的实验设置)下评估键形提取模型。结果,目前尚不清楚提出的模型的实际表现以及它们如何相互比较。在这项工作中,我们通过对最新的键形提取模型进行系统的大规模分析来解决此问题,该模型涉及来自各种来源和域的多个基准数据集。我们的主要结果表明,最先进的模型实际上仍然受到某些数据集中简单基线的挑战。我们还提供了有关使用作者或读者分配的键形作为代理金标准的影响的新见解,并为强大的基准和可靠的基准数据集提供建议。
Keyphrase extraction models are usually evaluated under different, not directly comparable, experimental setups. As a result, it remains unclear how well proposed models actually perform, and how they compare to each other. In this work, we address this issue by presenting a systematic large-scale analysis of state-of-the-art keyphrase extraction models involving multiple benchmark datasets from various sources and domains. Our main results reveal that state-of-the-art models are in fact still challenged by simple baselines on some datasets. We also present new insights about the impact of using author- or reader-assigned keyphrases as a proxy for gold standard, and give recommendations for strong baselines and reliable benchmark datasets.