论文标题
基于标记的神经网络系统,用于提取健康的社会决定因素
A Marker-based Neural Network System for Extracting Social Determinants of Health
论文作者
论文摘要
客观的。社会决定因素(SDOH)对患者的医疗保健质量和差异的影响是众所周知的。许多SDOH项目未在电子健康记录中以结构化形式进行编码。这些项目通常是在自由文本临床注释中捕获的,但是自动提取它们的方法有限。我们探索涉及命名实体识别(NER),关系分类(RC)和文本分类方法的多阶段管道,以自动从临床注释中提取SDOH信息。 材料和方法。该研究使用了N2C2共享的任务数据,该数据是从两个临床注释来源收集的:Mimic-III和华盛顿大学港口医学中心。它包含4480个社会历史部分,其中包含十二个SDOHS的完整注释。为了处理重叠实体的问题,我们开发了一种基于标记的NER模型。我们在多阶段管道中使用了它来从临床注释中提取SDOH信息。 结果。我们的基于标记的系统优于基于整个Micro-F1得分性能处理重叠实体的最新跨度模型。与共享任务方法相比,它还达到了最先进的性能。 结论。这项研究的主要发现是多阶段管道有效地从临床注释中提取了SDOH信息。这种方法可以潜在地改善临床环境中SDOHS的理解和跟踪。但是,错误传播可能是一个问题,需要进一步的研究来改善使用外部知识具有复杂语义含义和低资源实体的实体的提取。
Objective. The impact of social determinants of health (SDoH) on patients' healthcare quality and the disparity is well-known. Many SDoH items are not coded in structured forms in electronic health records. These items are often captured in free-text clinical notes, but there are limited methods for automatically extracting them. We explore a multi-stage pipeline involving named entity recognition (NER), relation classification (RC), and text classification methods to extract SDoH information from clinical notes automatically. Materials and Methods. The study uses the N2C2 Shared Task data, which was collected from two sources of clinical notes: MIMIC-III and University of Washington Harborview Medical Centers. It contains 4480 social history sections with full annotation for twelve SDoHs. In order to handle the issue of overlapping entities, we developed a novel marker-based NER model. We used it in a multi-stage pipeline to extract SDoH information from clinical notes. Results. Our marker-based system outperformed the state-of-the-art span-based models at handling overlapping entities based on the overall Micro-F1 score performance. It also achieved state-of-the-art performance compared to the shared task methods. Conclusion. The major finding of this study is that the multi-stage pipeline effectively extracts SDoH information from clinical notes. This approach can potentially improve the understanding and tracking of SDoHs in clinical settings. However, error propagation may be an issue, and further research is needed to improve the extraction of entities with complex semantic meanings and low-resource entities using external knowledge.