通过可解释的神经网络提取无监督的键形

论文标题

通过可解释的神经网络提取无监督的键形

Unsupervised Keyphrase Extraction via Interpretable Neural Networks

论文作者

Joshi, Rishabh, Balachandran, Vidhisha, Saldanha, Emily, Glenski, Maria, Volkova, Svitlana, Tsvetkov, Yulia

论文摘要

键形提取旨在自动提取代表文档中的关键概念的“重要”短语列表。通过嵌入聚类或图形中心性，需要借用了广泛的域专业知识，因此对无监督的键形提取提取的方法进行了诉诸词组重要性的概念。我们的工作提出了一种简单的替代方法，该方法将键形定义为文档短语，这些短语是预测文档主题的重要性。为此，我们提出了检查 - 一种使用自我解释模型来识别文档中有影响力的键形的方法，通过测量输入短语对文档主题分类的下游任务的预测影响。我们表明，这种新颖的方法不仅减轻了对临时启发式方法的需求，而且还可以实现最先进的方法，从而在两个领域的四个数据集中导致无监督的钥匙镜提取：科学出版物和新闻文章。

Keyphrase extraction aims at automatically extracting a list of "important" phrases representing the key concepts in a document. Prior approaches for unsupervised keyphrase extraction resorted to heuristic notions of phrase importance via embedding clustering or graph centrality, requiring extensive domain expertise. Our work presents a simple alternative approach which defines keyphrases as document phrases that are salient for predicting the topic of the document. To this end, we propose INSPECT -- an approach that uses self-explaining models for identifying influential keyphrases in a document by measuring the predictive impact of input phrases on the downstream task of the document topic classification. We show that this novel method not only alleviates the need for ad-hoc heuristics but also achieves state-of-the-art results in unsupervised keyphrase extraction in four datasets across two domains: scientific publications and news articles.

下载PDF全文

下载文献需遵守相关版权规定

论文标题