论文标题
多级模型解释的添加实例方法
An Additive Instance-Wise Approach to Multi-class Model Interpretation
论文作者
论文摘要
可解释的机器学习提供了有关哪些因素推动了黑盒系统的一定预测的见解。大量解释方法着重于识别解释性输入特征,这些特征通常分为两个主要类别:归因和选择。一种流行的基于归因的方法是以加性方式利用本地社区来学习实例特定的解释者。因此,该过程效率低下,并且容易受到条件较差的样品的影响。同时,许多基于选择的方法在实例培训框架中直接优化本地特征分布,从而能够从其他输入中利用全局信息。但是,由于严格依赖预定义的特征数量,他们只能解释单级预测,许多人在不同的设置上遭受了不一致的影响。这项工作利用了这两种方法的优势,并提出了一个同时学习多个目标类别的本地解释的框架。我们的模型解释器的表现明显优于忠诚的添加和实例的对应物,更紧凑,更可理解的解释。我们还通过对各种数据集和黑色框模型体系结构进行广泛的实验来选择稳定和重要功能的能力。
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system. A large number of interpreting methods focus on identifying explanatory input features, which generally fall into two main categories: attribution and selection. A popular attribution-based approach is to exploit local neighborhoods for learning instance-specific explainers in an additive manner. The process is thus inefficient and susceptible to poorly-conditioned samples. Meanwhile, many selection-based methods directly optimize local feature distributions in an instance-wise training framework, thereby being capable of leveraging global information from other inputs. However, they can only interpret single-class predictions and many suffer from inconsistency across different settings, due to a strict reliance on a pre-defined number of features selected. This work exploits the strengths of both methods and proposes a framework for learning local explanations simultaneously for multiple target classes. Our model explainer significantly outperforms additive and instance-wise counterparts on faithfulness with more compact and comprehensible explanations. We also demonstrate the capacity to select stable and important features through extensive experiments on various data sets and black-box model architectures.