论文标题
解释质量得分用于衡量可解释性方法的质量
Interpretation Quality Score for Measuring the Quality of interpretability methods
论文作者
论文摘要
近年来,机器学习(ML)模型已应用于广泛的自然语言处理(NLP)任务。除了做出准确的决策外,了解模型如何做出决策的必要性在许多应用中也变得显而易见。为此,已经开发了许多有助于解释ML模型决策过程的解释性方法。但是,目前尚无广泛认可的度量标准来评估这些方法产生的解释的质量。结果,目前尚无标准方法来在多大程度上达到预期目标的程度。此外,没有公认的绩效标准,我们可以比较和对当前现有的可解释性方法进行排名。在本文中,我们提出了一个新型指标,用于量化可解释性方法产生的解释的质量。我们使用六种可解释性方法计算三个NLP任务的指标,并提出我们的结果。
Machine learning (ML) models have been applied to a wide range of natural language processing (NLP) tasks in recent years. In addition to making accurate decisions, the necessity of understanding how models make their decisions has become apparent in many applications. To that end, many interpretability methods that help explain the decision processes of ML models have been developed. Yet, there currently exists no widely-accepted metric to evaluate the quality of explanations generated by these methods. As a result, there currently is no standard way of measuring to what degree an interpretability method achieves an intended objective. Moreover, there is no accepted standard of performance by which we can compare and rank the current existing interpretability methods. In this paper, we propose a novel metric for quantifying the quality of explanations generated by interpretability methods. We compute the metric on three NLP tasks using six interpretability methods and present our results.