论文标题

可以提示探测验证的语言模型吗?从因果观点了解无形的风险

Can Prompt Probe Pretrained Language Models? Understanding the Invisible Risks from a Causal View

论文作者

Cao, Boxi, Lin, Hongyu, Han, Xianpei, Liu, Fangchao, Sun, Le

论文摘要

基于及时的探测已被广泛用于评估验证语言模型(PLM)的能力。不幸的是,最近的研究发现这种评估可能是不准确,不一致且不可靠的。此外,缺乏理解其内部运作,再加上其广泛的适用性,有可能导致不可预见的风险评估和应用PLM在现实世界应用中。为了发现,理解和量化风险,本文从因果观点中研究了基于及时的探测,突出了三个关键偏见,这些偏见可能引起偏见的结果和结论,并建议通过因果干预进行依据。本文为设计无偏的数据集,更好的探测框架以及对审前的语言模型的更可靠评估提供了宝贵的见解。此外,我们的结论还回应了我们需要重新考虑确定较为审慎的语言模型的标准。我们在https://github.com/c-box/causaleval上公开发布了源代码和数据。

Prompt-based probing has been widely used in evaluating the abilities of pretrained language models (PLMs). Unfortunately, recent studies have discovered such an evaluation may be inaccurate, inconsistent and unreliable. Furthermore, the lack of understanding its inner workings, combined with its wide applicability, has the potential to lead to unforeseen risks for evaluating and applying PLMs in real-world applications. To discover, understand and quantify the risks, this paper investigates the prompt-based probing from a causal view, highlights three critical biases which could induce biased results and conclusions, and proposes to conduct debiasing via causal intervention. This paper provides valuable insights for the design of unbiased datasets, better probing frameworks and more reliable evaluations of pretrained language models. Furthermore, our conclusions also echo that we need to rethink the criteria for identifying better pretrained language models. We openly released the source code and data at https://github.com/c-box/causalEval.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源