可解释的文档分类的语料库级和基于概念的解释

论文标题

可解释的文档分类的语料库级和基于概念的解释

Corpus-level and Concept-based Explanations for Interpretable Document Classification

论文作者

Shi, Tian, Zhang, Xuchao, Wang, Ping, Reddy, Chandan K.

论文摘要

使用注意力权重识别对于模型决策至关重要的信息是一种流行的方法来解释基于注意力的神经网络。这通常是通过基于注意力重量的每个文档生成热图的实践中实现的。但是，这种解释方法是脆弱的，易于找到矛盾的例子。在本文中，我们提出了一种语料库级解释方法，该方法旨在通过学习关键字和模型预测之间的因果关系，通过学习关键字对基于注意力重量的培训语料库的预测标签的重要性。基于这个想法，我们进一步提出了一种基于概念的解释方法，该方法可以自动学习高级概念及其对建模预测任务的重要性。我们基于概念的解释方法建立在一个新颖的抽象 - 聚集网络上，该网络可以在端到端的培训过程中自动群集重要的关键字。我们将这些方法应用于文档分类任务，并表明它们在提取语义上有意义的关键字和概念方面具有强大的功能。我们基于基于注意力的天真贝叶斯分类器的一致性分析结果也证明了这些关键字和概念对于模型预测很重要。

Using attention weights to identify information that is important for models' decision-making is a popular approach to interpret attention-based neural networks. This is commonly realized in practice through the generation of a heat-map for every single document based on attention weights. However, this interpretation method is fragile, and easy to find contradictory examples. In this paper, we propose a corpus-level explanation approach, which aims to capture causal relationships between keywords and model predictions via learning the importance of keywords for predicted labels across a training corpus based on attention weights. Based on this idea, we further propose a concept-based explanation method that can automatically learn higher-level concepts and their importance to model prediction tasks. Our concept-based explanation method is built upon a novel Abstraction-Aggregation Network, which can automatically cluster important keywords during an end-to-end training process. We apply these methods to the document classification task and show that they are powerful in extracting semantically meaningful keywords and concepts. Our consistency analysis results based on an attention-based Naïve Bayes classifier also demonstrate these keywords and concepts are important for model predictions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题