贝叶斯网络分类器的影响驱动的解释

论文标题

贝叶斯网络分类器的影响驱动的解释

Influence-Driven Explanations for Bayesian Network Classifiers

论文作者

Rago, Antonio, Albini, Emanuele, Baroni, Pietro, Toni, Francesca

论文摘要

近年来，AI中最紧迫的问题之一是需要解决许多模型的解释性。我们专注于离散的贝叶斯网络分类器（BCS）的解释，通过在解释中包括中间变量，而不仅仅是标准实践中的输入和输出变量，从而针对其内部工作的更大透明度。 BC的拟议的影响驱动的解释（IDX）是使用BC内变量之间的因果关系系统生成的，称为影响，然后根据其行为按逻辑要求分类，称为关系属性。这些关系属性都提供了超越启发式解释方法的保证，并且允许根据特定上下文和用户的要求量身定制说明的信息，例如，IDX可能是辩证的或反事实的。我们展示了IDXS解释各种形式的BC，例如天真或多标签，二进制或分类的能力，并整合了文献中BCS的最新方法。我们通过理论和经验分析评估IDX，与现有解释方法相比，它们的优势很大。

One of the most pressing issues in AI in recent years has been the need to address the lack of explainability of many of its models. We focus on explanations for discrete Bayesian network classifiers (BCs), targeting greater transparency of their inner workings by including intermediate variables in explanations, rather than just the input and output variables as is standard practice. The proposed influence-driven explanations (IDXs) for BCs are systematically generated using the causal relationships between variables within the BC, called influences, which are then categorised by logical requirements, called relation properties, according to their behaviour. These relation properties both provide guarantees beyond heuristic explanation methods and allow the information underpinning an explanation to be tailored to a particular context's and user's requirements, e.g., IDXs may be dialectical or counterfactual. We demonstrate IDXs' capability to explain various forms of BCs, e.g., naive or multi-label, binary or categorical, and also integrate recent approaches to explanations for BCs from the literature. We evaluate IDXs with theoretical and empirical analyses, demonstrating their considerable advantages when compared with existing explanation methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题