论文标题
关系本地解释
Relational Local Explanations
论文作者
论文摘要
机器学习模型的大多数现有事后解释方法都会产生独立的,可变性的特征归因分数,忽略了均匀结构化数据的关键固有特征,例如视觉或文本数据:存在特征之间的潜在可变性。作为响应,我们基于输入变量之间的关系分析开发了一种新型的模型和基于置换的特征归因方法。结果,我们能够对机器学习模型的预测和决策进行更广泛的了解。与涉及图像和文本数据模式的各种设置的最新归因技术相比,我们框架的实验评估证明了我们方法的有效性和有效性。
The majority of existing post-hoc explanation approaches for machine learning models produce independent, per-variable feature attribution scores, ignoring a critical inherent characteristics of homogeneously structured data, such as visual or text data: there exist latent inter-variable relationships between features. In response, we develop a novel model-agnostic and permutation-based feature attribution approach based on the relational analysis between input variables. As a result, we are able to gain a broader insight into the predictions and decisions of machine learning models. Experimental evaluations of our framework in comparison with state-of-the-art attribution techniques on various setups involving both image and text data modalities demonstrate the effectiveness and validity of our method.