论文标题
知识基础知识的稳定程度如何?
How Stable is Knowledge Base Knowledge?
论文作者
论文摘要
知识库(KB)以有关现实世界实体,其特性和关系的广泛事实的形式提供了现实世界的结构化表示。它们在大规模智能系统中无处不在,可以利用结构化信息,例如结构化搜索,问题答案和推理等任务,因此它们的数据质量变得至关重要。现实世界中变化的必然性,将我们带到了KBS的中心属性 - 它们具有高度的动态,因为它们所包含的信息不断发生变化。换句话说,KB不稳定。 在本文中,我们研究了KB稳定性的概念,特别是由于现实世界变化而改变的KBS问题。某些实体 - 统治对不再发生现实的变化(例如,爱因斯坦 - 孩子或特斯拉群落者),而其他人可能会在未来发生很大的变化(例如,截至2022年截至2022年,Tesla-Board成员或Ronaldo-Copupation)。现实世界中扎根的变化的概念与仅影响数据的其他变化不同,尤其是校正和延迟插入,这些变化已经在数据清洁,故意探测和完整性估计中引起了人们的注意。为了分析KB稳定性,我们分三个步骤进行。 (1)我们提出了启发式方法,以描绘由于延迟的完成和校正而导致的变化,并使用它们来研究各种Wikidata域的现实演化行为,从而在属性方面发现了很高的偏斜。 (2)我们评估启发式方法,以识别可能由于现实世界变化而可能不会变化的实体和属性,并过滤固有的稳定实体和属性。 (3)我们评估了预测事后稳定性的可能性,特别是预测实体属性的变化,发现在平衡的二元稳定性预测任务上,这是可能的,最高为83%的F1得分。
Knowledge Bases (KBs) provide structured representation of the real-world in the form of extensive collections of facts about real-world entities, their properties and relationships. They are ubiquitous in large-scale intelligent systems that exploit structured information such as in tasks like structured search, question answering and reasoning, and hence their data quality becomes paramount. The inevitability of change in the real-world, brings us to a central property of KBs -- they are highly dynamic in that the information they contain are constantly subject to change. In other words, KBs are unstable. In this paper, we investigate the notion of KB stability, specifically, the problem of KBs changing due to real-world change. Some entity-property-pairs do not undergo change in reality anymore (e.g., Einstein-children or Tesla-founders), while others might well change in the future (e.g., Tesla-board member or Ronaldo-occupation as of 2022). This notion of real-world grounded change is different from other changes that affect the data only, notably correction and delayed insertion, which have received attention in data cleaning, vandalism detection, and completeness estimation already. To analyze KB stability, we proceed in three steps. (1) We present heuristics to delineate changes due to world evolution from delayed completions and corrections, and use these to study the real-world evolution behaviour of diverse Wikidata domains, finding a high skew in terms of properties. (2) We evaluate heuristics to identify entities and properties likely to not change due to real-world change, and filter inherently stable entities and properties. (3) We evaluate the possibility of predicting stability post-hoc, specifically predicting change in a property of an entity, finding that this is possible with up to 83% F1 score, on a balanced binary stability prediction task.