迈向忠实解释的NLP系统：我们应该如何定义和评估忠诚？

论文标题

迈向忠实解释的NLP系统：我们应该如何定义和评估忠诚？

Towards Faithfully Interpretable NLP Systems: How should we define and evaluate faithfulness?

论文作者

Jacovi, Alon, Goldberg, Yoav

论文摘要

随着基于深度学习的NLP模型的日益普及，需要解释的系统。但是什么是解释性，什么构成高质量的解释？在这篇文章中，我们反思可解释性评估研究的现状。我们呼吁更清楚地区分不同所需的标准，解释应满足，并关注忠诚标准。我们对忠诚评估的文献进行了调查，并安排了三个假设围绕当前的方法，从而为社区“定义”了忠诚的明确形式。我们提供有关解释方法评估的具体指南，不应进行。最后，我们声称当前对忠诚的二元定义为被视为忠实的标准设定了一个潜在的不切实际的标准。我们呼吁丢弃忠实的二元概念，而倾向于更分级的概念，我们认为这将具有更大的实用性。

With the growing popularity of deep-learning based NLP models, comes a need for interpretable systems. But what is interpretability, and what constitutes a high-quality interpretation? In this opinion piece we reflect on the current state of interpretability evaluation research. We call for more clearly differentiating between different desired criteria an interpretation should satisfy, and focus on the faithfulness criteria. We survey the literature with respect to faithfulness evaluation, and arrange the current approaches around three assumptions, providing an explicit form to how faithfulness is "defined" by the community. We provide concrete guidelines on how evaluation of interpretation methods should and should not be conducted. Finally, we claim that the current binary definition for faithfulness sets a potentially unrealistic bar for being considered faithful. We call for discarding the binary notion of faithfulness in favor of a more graded one, which we believe will be of greater practical utility.

下载PDF全文

下载文献需遵守相关版权规定

论文标题