论文标题
尼罗河:忠实的自然语言解释的自然语言推断
NILE : Natural Language Inference with Faithful Natural Language Explanations
论文作者
论文摘要
深度学习模型在NLP分类任务上的流行和成功的最新增长伴随着需要对预测标签产生某种形式的自然语言解释。这种产生的自然语言(NL)的解释有望是忠实的,即它们应该与模型的内部决策良好相关。在这项工作中,我们专注于自然语言推理(NLI)的任务,并解决以下问题:我们可以构建以高准确性生产标签的NLI系统,同时还可以对其决策产生忠实的解释?我们提出了对标签特异性解释(NILE)的自然语言推断,这是一种新型的NLI方法,它利用自动生成的标签特异性NL解释来产生标签及其忠实的解释。我们通过对生产标签和解释的自动化和人类评估来证明尼罗河对先前报道的方法的有效性。我们对尼罗河的评估还支持这样的说法,即可以设计能够对其决策提供可检验的解释的准确系统。我们从决定对相应解释的敏感性方面讨论了尼罗河解释的忠诚。我们认为,除了标签和解释精度外,对忠诚的明确评估是评估模型解释的重要一步。此外,我们证明了特定于任务的探针对于建立这种敏感性是必要的。
The recent growth in the popularity and success of deep learning models on NLP classification tasks has accompanied the need for generating some form of natural language explanation of the predicted labels. Such generated natural language (NL) explanations are expected to be faithful, i.e., they should correlate well with the model's internal decision making. In this work, we focus on the task of natural language inference (NLI) and address the following question: can we build NLI systems which produce labels with high accuracy, while also generating faithful explanations of its decisions? We propose Natural-language Inference over Label-specific Explanations (NILE), a novel NLI method which utilizes auto-generated label-specific NL explanations to produce labels along with its faithful explanation. We demonstrate NILE's effectiveness over previously reported methods through automated and human evaluation of the produced labels and explanations. Our evaluation of NILE also supports the claim that accurate systems capable of providing testable explanations of their decisions can be designed. We discuss the faithfulness of NILE's explanations in terms of sensitivity of the decisions to the corresponding explanations. We argue that explicit evaluation of faithfulness, in addition to label and explanation accuracy, is an important step in evaluating model's explanations. Further, we demonstrate that task-specific probes are necessary to establish such sensitivity.