大规模实体一致性的高质量任务部门

论文标题

大规模实体一致性的高质量任务部门

High-quality Task Division for Large-scale Entity Alignment

论文作者

Liu, Bing, Hua, Wen, Zuccon, Guido, Zhao, Genghong, Zhang, Xia

论文摘要

实体对齐（EA）的目的是匹配引用相同现实世界对象的等效实体，并且是知识图（kg）融合的关键步骤。大多数神经EA模型由于其过度消耗GPU记忆和时间而无法应用于大型现实生活中。一个有希望的解决方案是将大型EA任务分为几个子任务，以便每个子任务只需要匹配原始kg的两个小子图即可。但是，在不失去效力的情况下分配EA任务是一项挑战。现有方法显示了潜在映射的覆盖范围较低，上下文图中的证据不足以及子任务大小不同。在这项工作中，我们设计了具有高质量任务部门的大规模EA的分区框架。为了在EA子任务中包括最初存在于大型EA任务中的潜在映射的很大比例，我们设计了一种对应的发现方法，该方法利用了EA任务的局部原理和训练有素的EA模型的力量。我们的对手发现方法所独有的是潜在映射的机会的明确建模。我们还介绍了传递机制的证据，以量化上下文实体的信息性，并找到对子任务大小的灵活控制的最有用的上下文图。广泛的实验表明，与替代性的最先进的解决方案相比，分区的EA性能更高。

Entity Alignment (EA) aims to match equivalent entities that refer to the same real-world objects and is a key step for Knowledge Graph (KG) fusion. Most neural EA models cannot be applied to large-scale real-life KGs due to their excessive consumption of GPU memory and time. One promising solution is to divide a large EA task into several subtasks such that each subtask only needs to match two small subgraphs of the original KGs. However, it is challenging to divide the EA task without losing effectiveness. Existing methods display low coverage of potential mappings, insufficient evidence in context graphs, and largely differing subtask sizes. In this work, we design the DivEA framework for large-scale EA with high-quality task division. To include in the EA subtasks a high proportion of the potential mappings originally present in the large EA task, we devise a counterpart discovery method that exploits the locality principle of the EA task and the power of trained EA models. Unique to our counterpart discovery method is the explicit modelling of the chance of a potential mapping. We also introduce an evidence passing mechanism to quantify the informativeness of context entities and find the most informative context graphs with flexible control of the subtask size. Extensive experiments show that DivEA achieves higher EA performance than alternative state-of-the-art solutions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题