通过输入边缘化解释NLP模型

论文标题

通过输入边缘化解释NLP模型

Interpretation of NLP models through input marginalization

论文作者

Kim, Siwon, Yi, Jihun, Kim, Eunji, Yoon, Sungroh

论文摘要

为了揭开自然语言处理（NLP）深神经网络（NLP）的“黑匣子”属性，已经提出了几种方法来解释其预测，通过测量删除输入的每个标记后的预测概率变化。由于现有方法用预定义值（即零）代替每个令牌，因此所产生的句子在于训练数据分布，产生了误导性的解释。在这项研究中，我们提出了现有解释方法引起的分布外问题，并提出了一种补救措施。我们建议将每个令牌边缘化。我们解释了使用拟议方法培训的各种NLP模型，这些模型接受了情感分析和自然语言推断。

To demystify the "black box" property of deep neural networks for natural language processing (NLP), several methods have been proposed to interpret their predictions by measuring the change in prediction probability after erasing each token of an input. Since existing methods replace each token with a predefined value (i.e., zero), the resulting sentence lies out of the training data distribution, yielding misleading interpretations. In this study, we raise the out-of-distribution problem induced by the existing interpretation methods and present a remedy; we propose to marginalize each token out. We interpret various NLP models trained for sentiment analysis and natural language inference using the proposed method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题