论文标题

使用时间感知到败血性休克早期预测,重建缺失EHR

Reconstructing Missing EHRs Using Time-Aware Within- and Cross-Visit Information for Septic Shock Early Prediction

论文作者

Gao, Ge, Khoshnevisan, Farzaneh, Chi, Min

论文摘要

现实世界中的电子健康记录(EHR)通常会受到高丢失数据率的困扰。例如,在我们的EHR中,对于某些功能,缺失率可能高达90%,所有功能的平均缺失率约为70%。我们提出了一种时间感知的双交叉访问缺失价值插补方法,称为TA-DUALCV,该方法自发利用跨特征和纵向依赖性的多元依赖性在EHRS中从有限的可观察记录中提取的信息。具体而言,ta-dualCV捕获了不同特征的测量值的缺失模式的潜在结构,它还考虑了时间连续性并根据时间步长和不规则的时间介入捕获潜在的时间丢失模式。使用三种类型的任务使用三个大型现实世界EHR评估TA-DUALCV:通过将掩盖率更改高达90%的掩码率和使用长期短期记忆(LSTM)的24小时早期预测,无监督的选级任务。我们的结果表明,TA-DUALCV在两种任务上的所有现有最先进的选级基线(例如底特律和驯服)的表现明显好。

Real-world Electronic Health Records (EHRs) are often plagued by a high rate of missing data. In our EHRs, for example, the missing rates can be as high as 90% for some features, with an average missing rate of around 70% across all features. We propose a Time-Aware Dual-Cross-Visit missing value imputation method, named TA-DualCV, which spontaneously leverages multivariate dependencies across features and longitudinal dependencies both within- and cross-visit to maximize the information extracted from limited observable records in EHRs. Specifically, TA-DualCV captures the latent structure of missing patterns across measurements of different features and it also considers the time continuity and capture the latent temporal missing patterns based on both time-steps and irregular time-intervals. TA-DualCV is evaluated using three large real-world EHRs on two types of tasks: an unsupervised imputation task by varying mask rates up to 90% and a supervised 24-hour early prediction of septic shock using Long Short-Term Memory (LSTM). Our results show that TA-DualCV performs significantly better than all of the existing state-of-the-art imputation baselines, such as DETROIT and TAME, on both types of tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源