论文标题
基于自然图像统计的采样可改善本地替代解释器
Sampling Based On Natural Image Statistics Improves Local Surrogate Explainers
论文作者
论文摘要
最近,使用模型无法轻易解释,最常见的神经网络的模型最近解决了计算机视觉中的许多问题。替代解释器是一种流行的事后解释性方法,可以进一步了解模型如何到达特定预测。通过训练一个简单,更容易解释的模型以局部近似于非解剖系统的决策边界,我们可以估计输入特征在预测上的相对重要性。专注于图像,替代解释器,例如石灰,通过在可解释的域中采样来生成查询图像周围的本地邻域。但是,这些可解释的域传统上仅来自查询图像的固有特征,而不是考虑到非解剖模型在训练中暴露的数据的歧视(或更一般而言,是真实图像的多数)。这导致对潜在低概率图像训练的次优替代物。我们通过对齐本地社区来解决此限制,即使无法访问此分配,代理人也接受了原始培训数据分配的培训。我们提出了两种这样做的方法,即(1)更改对当地邻域进行采样的方法,以及(2)使用感知指标传达自然图像分布的某些特性。
Many problems in computer vision have recently been tackled using models whose predictions cannot be easily interpreted, most commonly deep neural networks. Surrogate explainers are a popular post-hoc interpretability method to further understand how a model arrives at a particular prediction. By training a simple, more interpretable model to locally approximate the decision boundary of a non-interpretable system, we can estimate the relative importance of the input features on the prediction. Focusing on images, surrogate explainers, e.g., LIME, generate a local neighbourhood around a query image by sampling in an interpretable domain. However, these interpretable domains have traditionally been derived exclusively from the intrinsic features of the query image, not taking into consideration the manifold of the data the non-interpretable model has been exposed to in training (or more generally, the manifold of real images). This leads to suboptimal surrogates trained on potentially low probability images. We address this limitation by aligning the local neighbourhood on which the surrogate is trained with the original training data distribution, even when this distribution is not accessible. We propose two approaches to do so, namely (1) altering the method for sampling the local neighbourhood and (2) using perceptual metrics to convey some of the properties of the distribution of natural images.