论文标题
关于模型可解释性的定量方面
On quantitative aspects of model interpretability
论文作者
论文摘要
尽管可解释的机器学习中的工作越来越多,但尚不清楚如何评估不同的解释性方法而不诉诸定性评估和用户研究。尽管可解释性是固有的主观问题,但以前的认知科学和认识论中的作品表明,良好的解释确实具有可以客观地判断的方面,除了忠诚之外),例如助理和广泛。在本文中,我们提出了一组指标,以编程评估这些维度的可解释性方法。特别是,我们认为,沿这些维度的方法的性能可以正交归咎于两个概念部分,即功能提取器和实际的解释性方法。我们通过实验性地验证了不同基准任务的指标,并展示了如何用于指导从业者选择最合适的手头任务方法。
Despite the growing body of work in interpretable machine learning, it remains unclear how to evaluate different explainability methods without resorting to qualitative assessment and user-studies. While interpretability is an inherently subjective matter, previous works in cognitive science and epistemology have shown that good explanations do possess aspects that can be objectively judged apart from fidelity), such assimplicity and broadness. In this paper we propose a set of metrics to programmatically evaluate interpretability methods along these dimensions. In particular, we argue that the performance of methods along these dimensions can be orthogonally imputed to two conceptual parts, namely the feature extractor and the actual explainability method. We experimentally validate our metrics on different benchmark tasks and show how they can be used to guide a practitioner in the selection of the most appropriate method for the task at hand.