论文标题

近:命名实体和临床概念的属性识别

NEAR: Named Entity and Attribute Recognition of clinical concepts

论文作者

Nath, Namrata, Lee, Sang-Heon, Lee, Ivan

论文摘要

指定的实体识别(NER)或从临床文本中提取概念是识别文本中的实体并将其插入问题,治疗,测试,临床部门,事件(例如入学和出院)等类别的任务。 NER构成了处理和利用电子健康记录(EHR)的非结构化数据的关键组成部分。尽管确定概念的跨度和类别本身是一项具有挑战性的任务,但这些实体也可能具有诸如否定属性,即否定其含义暗示着指定实体的消费者。几乎没有研究将实体及其合格属性共同确定。这项研究希望通过将NER任务建模为有监督的多标签标记问题,为检测实体及其相应属性做出贡献。在本文中,我们提出了3种体系结构来实现此多标签实体标签:Bilstm N-CRF,Bilstm-Crf-Smax-TF和Bilstm N-CRF-TF。我们在2010 I2B2/VA和I2B2 2012共享任务数据集上评估了这些方法。我们的不同模型分别获得I2B2 2010/VA和I2B2 2012的最佳NER F1得分为0.894和0.808。在I2B2 2010/VA和I2B2 2012数据集上,获得的最高跨度微平均F1极性得分分别为0.832和0.836,并且获得的最高宏观平均F1极性得分分别为0.924和0.888。在I2B2 2012数据集上进行的模态研究显示,基于跨度的微平均F1和宏观平均F1的高分分别为0.818和0.501。

Named Entity Recognition (NER) or the extraction of concepts from clinical text is the task of identifying entities in text and slotting them into categories such as problems, treatments, tests, clinical departments, occurrences (such as admission and discharge) and others. NER forms a critical component of processing and leveraging unstructured data from Electronic Health Records (EHR). While identifying the spans and categories of concepts is itself a challenging task, these entities could also have attributes such as negation that pivot their meanings implied to the consumers of the named entities. There has been little research dedicated to identifying the entities and their qualifying attributes together. This research hopes to contribute to the area of detecting entities and their corresponding attributes by modelling the NER task as a supervised, multi-label tagging problem with each of the attributes assigned tagging sequence labels. In this paper, we propose 3 architectures to achieve this multi-label entity tagging: BiLSTM n-CRF, BiLSTM-CRF-Smax-TF and BiLSTM n-CRF-TF. We evaluate these methods on the 2010 i2b2/VA and the i2b2 2012 shared task datasets. Our different models obtain best NER F1 scores of 0. 894 and 0.808 on the i2b2 2010/VA and i2b2 2012 respectively. The highest span based micro-averaged F1 polarity scores obtained were 0.832 and 0.836 on the i2b2 2010/VA and i2b2 2012 datasets respectively, and the highest macro-averaged F1 polarity scores obtained were 0.924 and 0.888 respectively. The modality studies conducted on i2b2 2012 dataset revealed high scores of 0.818 and 0.501 for span based micro-averaged F1 and macro-averaged F1 respectively.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源