论文标题
通过电子健康记录的知识基于图形的问题回答
Knowledge Graph-based Question Answering with Electronic Health Records
论文作者
论文摘要
问题回答(QA)是用于开发和评估智能机器的广泛使用的框架。据此,关于电子健康记录的质量检查(EHR),即EHR QA,可以作为在医疗保健中发展智能代理的关键里程碑。 EHR数据通常存储在关系数据库中,该数据库也可以转换为有向的无环图,允许使用EHR QA的两种方法:基于表格的QA和基于知识图的QA。我们假设基于图的方法更适合EHR QA,因为与表相比,图形可以更自然地代表实体和值之间的关系,而表格基本上需要加入操作。在本文中,我们提出了一个基于图的EHR QA,其中自然语言查询转换为SPARQL而不是SQL。为了验证我们的假设,我们基于一个基于基于表的数据集MimicsQL创建了四个EHR QA数据集(基于图的基于图和基于图的数据库架构与原始数据库架构)。我们在所有数据集上都测试了一个简单的SEQ2SEQ模型和最先进的EHR QA模型,基于图的数据集比基于表格的数据集更高的精度高达34%,而无需对模型体系结构进行任何修改。最后,所有数据集都是开源的,以鼓励在这两个方向上进行进一步的EHR质量检查。
Question Answering (QA) is a widely-used framework for developing and evaluating an intelligent machine. In this light, QA on Electronic Health Records (EHR), namely EHR QA, can work as a crucial milestone towards developing an intelligent agent in healthcare. EHR data are typically stored in a relational database, which can also be converted to a directed acyclic graph, allowing two approaches for EHR QA: Table-based QA and Knowledge Graph-based QA. We hypothesize that the graph-based approach is more suitable for EHR QA as graphs can represent relations between entities and values more naturally compared to tables, which essentially require JOIN operations. In this paper, we propose a graph-based EHR QA where natural language queries are converted to SPARQL instead of SQL. To validate our hypothesis, we create four EHR QA datasets (graph-based VS table-based, and simplified database schema VS original database schema), based on a table-based dataset MIMICSQL. We test both a simple Seq2Seq model and a state-of-the-art EHR QA model on all datasets where the graph-based datasets facilitated up to 34% higher accuracy than the table-based dataset without any modification to the model architectures. Finally, all datasets are open-sourced to encourage further EHR QA research in both directions.