KMIR：评估语言模型的知识记忆，识别和推理能力的基准

论文标题

KMIR：评估语言模型的知识记忆，识别和推理能力的基准

KMIR: A Benchmark for Evaluating Knowledge Memorization, Identification and Reasoning Abilities of Language Models

论文作者

Gao, Daniel, Jia, Yantao, Li, Lei, Fu, Chengzhen, Dou, Zhicheng, Jiang, Hao, Zhang, Xinyu, Chen, Lei, Cao, Zhao

论文摘要

先前的作品表明，预训练的语言模型（PLM）的巨大潜力可以存储大量的事实知识。但是，要弄清PLM是否可以是可靠的知识源并用作替代知识库（KBS），我们需要进一步探索PLM的一些关键特征。首先，知识的记忆和身份能力：传统的KB可以存储各种类型的实体和关系； PLM是否具有较高的知识能力来存储不同类型的知识？其次，推理能力：合格的知识源不仅应提供事实的集合，而且还应支持象征性的推理者。 PLM可以根据事实之间的相关性得出新知识吗？为了评估PLM的这些特征，我们提出了一个基准，称为知识记忆，识别和推理测试（KMIR）。 KMIR涵盖了3种类型的知识，包括通用知识，特定领域的知识和常识，并提供了184,348个精心设计的问题。 KMIR上具有各种代表性预训练语言模型的初步实验揭示了许多有趣的现象：1）PLM的记忆能力比训练方案更取决于参数的数量。 2）当前的PLM正在努力地记住事实。 3）模型压缩技术保留了很好的知识量，但损害了识别和推理能力。我们希望KMIR能够促进PLM作为更好的知识来源的设计。

Previous works show the great potential of pre-trained language models (PLMs) for storing a large amount of factual knowledge. However, to figure out whether PLMs can be reliable knowledge sources and used as alternative knowledge bases (KBs), we need to further explore some critical features of PLMs. Firstly, knowledge memorization and identification abilities: traditional KBs can store various types of entities and relationships; do PLMs have a high knowledge capacity to store different types of knowledge? Secondly, reasoning ability: a qualified knowledge source should not only provide a collection of facts, but support a symbolic reasoner. Can PLMs derive new knowledge based on the correlations between facts? To evaluate these features of PLMs, we propose a benchmark, named Knowledge Memorization, Identification, and Reasoning test (KMIR). KMIR covers 3 types of knowledge, including general knowledge, domain-specific knowledge, and commonsense, and provides 184,348 well-designed questions. Preliminary experiments with various representative pre-training language models on KMIR reveal many interesting phenomenons: 1) The memorization ability of PLMs depends more on the number of parameters than training schemes. 2) Current PLMs are struggling to robustly remember the facts. 3) Model compression technology retains the amount of knowledge well, but hurts the identification and reasoning abilities. We hope KMIR can facilitate the design of PLMs as better knowledge sources.

下载PDF全文

下载文献需遵守相关版权规定

论文标题