通过热图发生器改善放射学中深度学习模型的疾病分类性能和解释性

论文标题

通过热图发生器改善放射学中深度学习模型的疾病分类性能和解释性

Improving Disease Classification Performance and Explainability of Deep Learning Models in Radiology with Heatmap Generators

论文作者

Watanabe, Akino, Ketabi, Sara, Khashayar, Namdar, Khalvati, Farzad

论文摘要

由于深度学习在放射学领域被广泛使用，因此在使用模型进行诊断时，这种模型的解释性越来越成为获得临床医生的信任至关重要的。在这项研究中，使用U-NET架构进行了三个实验集，以改善分类性能，同时通过在训练过程中结合热图生成器来增强与模型相对应的热图。所有实验均使用包含胸部X光片的数据集，这些数据集是来自三个条件之一（“正常”，“充血性心力衰竭（CHF）”和“肺炎”）的相关标签，以及有关放射线医生的眼神凝视坐标的数值信息。引入该数据集的论文（A. Karargyris和Moradi，2021年）开发了一个U-NET模型，该模型被视为这项研究的基线模型，以说明如何将眼光凝视数据用于多模式培训以进行解释性改进。为了比较分类性能，测量了接收器操作特征曲线（AUC）下面积的95％置信区间（CI）。最佳方法的AUC为0.913（CI：0.860-0.966）。最大的改进是“肺炎”和“ CHF”类别，基线模型最努力地进行分类，导致AUC为0.859（CI：0.732-0.957）和0.962（CI：0.933-0.989）。所提出的方法的解码器还能够产生概率掩模，以突出模型分类中确定的图像部分，类似于放射科医生的眼睛凝视数据。因此，这项工作表明，将热图发生器和眼睛凝视信息纳入训练可以同时改善疾病分类，并提供可解释的视觉效果，与放射线医生在进行诊断时如何看待胸部X光片。

As deep learning is widely used in the radiology field, the explainability of such models is increasingly becoming essential to gain clinicians' trust when using the models for diagnosis. In this research, three experiment sets were conducted with a U-Net architecture to improve the classification performance while enhancing the heatmaps corresponding to the model's focus through incorporating heatmap generators during training. All of the experiments used the dataset that contained chest radiographs, associated labels from one of the three conditions ("normal", "congestive heart failure (CHF)", and "pneumonia"), and numerical information regarding a radiologist's eye-gaze coordinates on the images. The paper (A. Karargyris and Moradi, 2021) that introduced this dataset developed a U-Net model, which was treated as the baseline model for this research, to show how the eye-gaze data can be used in multi-modal training for explainability improvement. To compare the classification performances, the 95% confidence intervals (CI) of the area under the receiver operating characteristic curve (AUC) were measured. The best method achieved an AUC of 0.913 (CI: 0.860-0.966). The greatest improvements were for the "pneumonia" and "CHF" classes, which the baseline model struggled most to classify, resulting in AUCs of 0.859 (CI: 0.732-0.957) and 0.962 (CI: 0.933-0.989), respectively. The proposed method's decoder was also able to produce probability masks that highlight the determining image parts in model classifications, similarly as the radiologist's eye-gaze data. Hence, this work showed that incorporating heatmap generators and eye-gaze information into training can simultaneously improve disease classification and provide explainable visuals that align well with how the radiologist viewed the chest radiographs when making diagnosis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题