论文标题
使用转移学习和融合策略的慢性阻塞性肺疾病的早期诊断
Early Diagnosis of Chronic Obstructive Pulmonary Disease from Chest X-Rays using Transfer Learning and Fusion Strategies
论文作者
论文摘要
慢性阻塞性肺疾病(COPD)是世界上最常见的慢性疾病之一,也是世界范围内死亡率的第三个主要原因。直到疾病病程后期,它通常被诊断或未被诊断出。肺活量测定测试是诊断COPD的黄金标准,但很难获得,尤其是在资源贫乏的国家中。但是,胸部X射线(CXR)很容易获得,并且可以用作筛选工具,以识别应接受进一步测试的COPD患者。当前,尚无研究使用深度学习(DL)算法,该算法使用大型多站点和多模式数据来检测COPD患者并评估人口统计组之间的公平性。在研究中,我们使用三个CXR数据集,CHExpert来预训练模型,模拟CXR来开发和Emory-CXR来验证我们的模型。在COPD的早期阶段而不是机械通气的患者的CXR被选择进行模型训练和验证。我们可视化Mimic-CXR和Emory-CXR测试数据集的基本模型上真正正病例的Grad-CAM热图。我们进一步提出了两种融合方案,(1)模型级融合,包括使用MIMIC-CXR装袋和堆叠方法,以及(2)数据级融合,包括使用Mimic-CXR和Emory-CXR和Mimodododal使用MIMIC-CXR和MIMIC-CXR和MIMIMIC-CXR和MIMIMIC-CXR和MIMIMIC-CXR和MIMIMIC-CXR和MIMIMIC-CXR和MIMIMIC-CXR和MIMIMIC-CXR和MIMIMIC-CXR和MIMIMIC-CXR和MIMIMIC-CXR和MIMIMIC-IV EHR,以改善总体模型,以改善总体模型。进行公平分析以评估融合方案是否在不同人口组之间的表现差异。结果表明,DL模型可以使用CXR检测COPD,这可以促进早期筛查,尤其是在CXR比肺活量测定法更容易访问的低资源区域。多站点数据融合方案可以改善对Emory-CXR测试数据的模型概括性。关于使用CXR或其他模式来预测COPD的进一步研究应该在以后的工作中进行。
Chronic obstructive pulmonary disease (COPD) is one of the most common chronic illnesses in the world and the third leading cause of mortality worldwide. It is often underdiagnosed or not diagnosed until later in the disease course. Spirometry tests are the gold standard for diagnosing COPD but can be difficult to obtain, especially in resource-poor countries. Chest X-rays (CXRs), however, are readily available and may serve as a screening tool to identify patients with COPD who should undergo further testing. Currently, no research applies deep learning (DL) algorithms that use large multi-site and multi-modal data to detect COPD patients and evaluate fairness across demographic groups. We use three CXR datasets in our study, CheXpert to pre-train models, MIMIC-CXR to develop, and Emory-CXR to validate our models. The CXRs from patients in the early stage of COPD and not on mechanical ventilation are selected for model training and validation. We visualize the Grad-CAM heatmaps of the true positive cases on the base model for both MIMIC-CXR and Emory-CXR test datasets. We further propose two fusion schemes, (1) model-level fusion, including bagging and stacking methods using MIMIC-CXR, and (2) data-level fusion, including multi-site data using MIMIC-CXR and Emory-CXR, and multi-modal using MIMIC-CXRs and MIMIC-IV EHR, to improve the overall model performance. Fairness analysis is performed to evaluate if the fusion schemes have a discrepancy in the performance among different demographic groups. The results demonstrate that DL models can detect COPD using CXRs, which can facilitate early screening, especially in low-resource regions where CXRs are more accessible than spirometry. The multi-site data fusion scheme could improve the model generalizability on the Emory-CXR test data. Further studies on using CXR or other modalities to predict COPD ought to be in future work.