论文标题

COVID-19知识图:加速信息检索和科学文献发现

COVID-19 Knowledge Graph: Accelerating Information Retrieval and Discovery for Scientific Literature

论文作者

Wise, Colby, Ioannidis, Vassilis N., Calvo, Miguel Romero, Song, Xiang, Price, George, Kulkarni, Ninad, Brand, Ryan, Bhatia, Parminder, Karypis, George

论文摘要

冠状病毒病(Covid-19)夺走了35万人的生命,并在全球范围内感染了超过600万人。几种搜索引擎已经浮出水面,为研究人员提供了其他工具,以查找和检索Covid-19的快速增长的Corpora的信息。这些引擎缺乏检索和解释科学文献固有的复杂关系所需的提取和可视化工具。此外,由于这些引擎主要依赖于语义信息,因此它们在文档中捕获复杂的全球关系的能力是有限的,这降低了基于相似性的文章建议的质量。在这项工作中,我们介绍了COVID-19知识图(CKG),这是一种用于提取和可视化Covid-19科学文章之间复杂关系的异质图。 CKG将语义信息与文档拓扑信息结合在一起,以应用类似文档检索。 CKG是使用数据的潜在模式构建的,然后使用可扩展的AWS技术从文章的非结构化文本中提取的生物医学实体信息,以在图中形成关系。最后,我们提出了一个文档相似性引擎,该引擎利用CKG的低维图嵌入以及语义嵌入类似物品检索的语义嵌入。分析证明了CKG中关系的质量,并表明它可用于发现Covid-19-19的科学文章中的有意义的信息。 CKG帮助www.cord19.aws www.cord19.aws公开使用。

The coronavirus disease (COVID-19) has claimed the lives of over 350,000 people and infected more than 6 million people worldwide. Several search engines have surfaced to provide researchers with additional tools to find and retrieve information from the rapidly growing corpora on COVID-19. These engines lack extraction and visualization tools necessary to retrieve and interpret complex relations inherent to scientific literature. Moreover, because these engines mainly rely upon semantic information, their ability to capture complex global relationships across documents is limited, which reduces the quality of similarity-based article recommendations for users. In this work, we present the COVID-19 Knowledge Graph (CKG), a heterogeneous graph for extracting and visualizing complex relationships between COVID-19 scientific articles. The CKG combines semantic information with document topological information for the application of similar document retrieval. The CKG is constructed using the latent schema of the data, and then enriched with biomedical entity information extracted from the unstructured text of articles using scalable AWS technologies to form relations in the graph. Finally, we propose a document similarity engine that leverages low-dimensional graph embeddings from the CKG with semantic embeddings for similar article retrieval. Analysis demonstrates the quality of relationships in the CKG and shows that it can be used to uncover meaningful information in COVID-19 scientific articles. The CKG helps power www.cord19.aws and is publicly available.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源