使用生成潜在空间中的对比反事实来解释图像分类器

论文标题

使用生成潜在空间中的对比反事实来解释图像分类器

Explaining Image Classifiers Using Contrastive Counterfactuals in Generative Latent Spaces

论文作者

Alipour, Kamran, Lahiri, Aditya, Adeli, Ehsan, Salimi, Babak, Pazzani, Michael

论文摘要

尽管它们的准确性很高，但由于未知的决策过程和潜在的偏见，现代复杂的图像分类器不能被敏感任务受到信任。反事实解释非常有效地为这些黑盒算法提供透明度。然而，生成可能对分类器输出产生一致影响并揭示可解释的特征更改的反事实是一项非常具有挑战性的任务。我们介绍了一种新颖的方法，以使用验证的生成模型为图像分类器生成因果关系但可解释的反事实解释，而无需进行任何重新训练或调节。该技术中的生成模型不可能在与目标分类器相同的数据上进行训练。我们使用此框架来获得对比和因果关系，并作为黑盒分类器的全球解释。在面部属性分类的任务上，我们通过提供因果和对比特征属性以及相应的反事实图像来展示不同属性如何影响分类器输出。

Despite their high accuracies, modern complex image classifiers cannot be trusted for sensitive tasks due to their unknown decision-making process and potential biases. Counterfactual explanations are very effective in providing transparency for these black-box algorithms. Nevertheless, generating counterfactuals that can have a consistent impact on classifier outputs and yet expose interpretable feature changes is a very challenging task. We introduce a novel method to generate causal and yet interpretable counterfactual explanations for image classifiers using pretrained generative models without any re-training or conditioning. The generative models in this technique are not bound to be trained on the same data as the target classifier. We use this framework to obtain contrastive and causal sufficiency and necessity scores as global explanations for black-box classifiers. On the task of face attribute classification, we show how different attributes influence the classifier output by providing both causal and contrastive feature attributions, and the corresponding counterfactual images.

下载PDF全文

下载文献需遵守相关版权规定

论文标题