DrugeHRQA：关于医学相关查询的结构化和非结构化电子健康记录的问题，回答数据集

论文标题

DrugeHRQA：关于医学相关查询的结构化和非结构化电子健康记录的问题，回答数据集

DrugEHRQA: A Question Answering Dataset on Structured and Unstructured Electronic Health Records For Medicine Related Queries

论文作者

Bardhan, Jayetri, Colas, Anthony, Roberts, Kirk, Wang, Daisy Zhe

论文摘要

本文开发了第一个问题回答数据集（DrugeHRQA），其中包含来自结构化表和公开电子健康记录（EHR）的无结构化表和非结构化笔记的问题。 EHR包含患者记录，存储在结构化表和非结构化临床笔记中。结构化和非结构化的EHR中的信息并非严格脱节：信息可能是重复，矛盾的，也可以在这些来源之间提供其他上下文。我们的数据集具有与药物相关的查询，其中包含超过70,000个问答对。为了提供基线模型并帮助分析数据集，我们使用了一个简单的模型（Multimodalehrqa），该模型使用模态选择网络的预测在EHR表和临床注释之间进行选择来回答问题。这用于将问题引导到基于表或基于文本的最先进的质量检查模型。为了解决复杂的，嵌套查询引起的问题，这是第一次使用与文本到SQL Parsers（RAT-SQL）的关系感知的模式编码和链接，用于测试EHR数据中查询模板的结构。我们的目标是为多模式QA系统提供基准数据集，并通过使用来自非结构化临床数据的上下文来改善对EHR结构化数据的问题的新研究途径。

This paper develops the first question answering dataset (DrugEHRQA) containing question-answer pairs from both structured tables and unstructured notes from a publicly available Electronic Health Record (EHR). EHRs contain patient records, stored in structured tables and unstructured clinical notes. The information in structured and unstructured EHRs is not strictly disjoint: information may be duplicated, contradictory, or provide additional context between these sources. Our dataset has medication-related queries, containing over 70,000 question-answer pairs. To provide a baseline model and help analyze the dataset, we have used a simple model (MultimodalEHRQA) which uses the predictions of a modality selection network to choose between EHR tables and clinical notes to answer the questions. This is used to direct the questions to the table-based or text-based state-of-the-art QA model. In order to address the problem arising from complex, nested queries, this is the first time Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers (RAT-SQL) has been used to test the structure of query templates in EHR data. Our goal is to provide a benchmark dataset for multi-modal QA systems, and to open up new avenues of research in improving question answering over EHR structured data by using context from unstructured clinical data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题