几何引导的集成梯度

论文标题

几何引导的集成梯度

Geometrically Guided Integrated Gradients

论文作者

Rahman, Md Mahfuzur, Lewis, Noah, Plis, Sergey

论文摘要

深度神经网络的可解释性方法主要集中于类得分相对于原始或扰动输入的敏感性，通常使用实际或修改的梯度测量。某些方法还使用模型不足的方法来理解每个预测背后的基本原理。在本文中，我们争论并证明了模型参数空间相对于输入的局部几何形状也可以有益于改进事后解释。为了实现这一目标，我们引入了一种称为“几何指导的集成梯度”的可解释性方法，该方法沿线性路径的梯度计算以传统上用于集成梯度方法的方式建立。但是，我们的方法不是整合梯度信息，而是从输入的多个缩放版本中探索了模型的动态行为，并捕获了每个输入的最佳归因。我们通过广泛的实验证明，所提出的方法在主观和定量评估中的表现优于香草和综合梯度。我们还提出了“模型扰动”理智检查，以补充传统上使用的“模型随机化”测试。

Interpretability methods for deep neural networks mainly focus on the sensitivity of the class score with respect to the original or perturbed input, usually measured using actual or modified gradients. Some methods also use a model-agnostic approach to understanding the rationale behind every prediction. In this paper, we argue and demonstrate that local geometry of the model parameter space relative to the input can also be beneficial for improved post-hoc explanations. To achieve this goal, we introduce an interpretability method called "geometrically-guided integrated gradients" that builds on top of the gradient calculation along a linear path as traditionally used in integrated gradient methods. However, instead of integrating gradient information, our method explores the model's dynamic behavior from multiple scaled versions of the input and captures the best possible attribution for each input. We demonstrate through extensive experiments that the proposed approach outperforms vanilla and integrated gradients in subjective and quantitative assessment. We also propose a "model perturbation" sanity check to complement the traditionally used "model randomization" test.

下载PDF全文

下载文献需遵守相关版权规定

论文标题