论文标题
以人为本的神经网络概念解释
Human-Centered Concept Explanations for Neural Networks
论文作者
论文摘要
在各种应用中了解复杂的机器学习模型,例如具有解释的深神经网络至关重要。许多解释源于模型的角度,并且不一定有效地传达了为什么该模型在正确的抽象级别进行预测。例如,在图像中为单个像素提供重要的权重只能表达该特定图像的哪些部分对模型很重要,但是人类可能更喜欢通过基于概念的思维来解释预测的解释。在这项工作中,我们回顾了基于概念的解释的新兴领域。我们首先介绍概念解释,包括一系列概念激活向量(CAV),这些解释是在适当的神经激活空间中使用向量来表征概念的,并讨论有用概念的不同属性,以及测量概念矢量有用性的方法。然后,我们讨论自动提取概念的方法,以及解决其一些警告的方法。最后,我们讨论了一些案例研究,这些案例研究展示了在合成环境和现实世界应用中这种基于概念的解释的实用性。
Understanding complex machine learning models such as deep neural networks with explanations is crucial in various applications. Many explanations stem from the model perspective, and may not necessarily effectively communicate why the model is making its predictions at the right level of abstraction. For example, providing importance weights to individual pixels in an image can only express which parts of that particular image are important to the model, but humans may prefer an explanation which explains the prediction by concept-based thinking. In this work, we review the emerging area of concept based explanations. We start by introducing concept explanations including the class of Concept Activation Vectors (CAV) which characterize concepts using vectors in appropriate spaces of neural activations, and discuss different properties of useful concepts, and approaches to measure the usefulness of concept vectors. We then discuss approaches to automatically extract concepts, and approaches to address some of their caveats. Finally, we discuss some case studies that showcase the utility of such concept-based explanations in synthetic settings and real world applications.