大型异构知识图的可扩展和可解释的基于规则的链接预测

论文标题

大型异构知识图的可扩展和可解释的基于规则的链接预测

Scalable and interpretable rule-based link prediction for large heterogeneous knowledge graphs

论文作者

Ott, Simon, Graf, Laura, Agibetov, Asan, Meilicke, Christian, Samwald, Matthias

论文摘要

基于神经嵌入的机器学习模型已经显示出预测生物医学知识图中新链接的希望。不幸的是，他们的实际效用由于缺乏解释性而削弱了。最近，完全可解释的基于规则的算法Anyburl在许多通用链路预测基准中都产生了高度竞争的结果。但是，它适用于复杂生物医学知识库的大规模预测任务受到漫长的推理时间和困难的限制。我们通过引入Safran规则应用程序框架来改进Anyburl，该框架通过可扩展的聚类算法汇总规则。 Safran在已建立的通用基准FB15K-237和大规模生物医学基准OpenBiolink上获得了完全可解释的链接预测的新最新结果。此外，它超出了FB15K-237上多个基于嵌入的算法的结果，并缩小了OpenBiolink上基于规则和基于嵌入的算法之间的差距。我们还表明，Safran最多将推理速度提高了两个数量级。

Neural embedding-based machine learning models have shown promise for predicting novel links in biomedical knowledge graphs. Unfortunately, their practical utility is diminished by their lack of interpretability. Recently, the fully interpretable, rule-based algorithm AnyBURL yielded highly competitive results on many general-purpose link prediction benchmarks. However, its applicability to large-scale prediction tasks on complex biomedical knowledge bases is limited by long inference times and difficulties with aggregating predictions made by multiple rules. We improve upon AnyBURL by introducing the SAFRAN rule application framework which aggregates rules through a scalable clustering algorithm. SAFRAN yields new state-of-the-art results for fully interpretable link prediction on the established general-purpose benchmark FB15K-237 and the large-scale biomedical benchmark OpenBioLink. Furthermore, it exceeds the results of multiple established embedding-based algorithms on FB15K-237 and narrows the gap between rule-based and embedding-based algorithms on OpenBioLink. We also show that SAFRAN increases inference speeds by up to two orders of magnitude.

下载PDF全文

下载文献需遵守相关版权规定

论文标题