准确且可解释的深度学习系统改善了观察者的一致性在胸部X光片解释中

论文标题

准确且可解释的深度学习系统改善了观察者的一致性在胸部X光片解释中

An Accurate and Explainable Deep Learning System Improves Interobserver Agreement in the Interpretation of Chest Radiograph

论文作者

Pham, Hieu H., Nguyen, Ha Q., Nguyen, Hieu T., Le, Linh T., Khanh, Lam

论文摘要

最近的人工智能（AI）算法已经在各种医学分类任务上实现了放射科医生级的性能。但是，只有少数研究解决了CXR扫描异常发现的定位，这对于向放射学家解释图像级分类至关重要。我们在本文中介绍了一个称为Vindr-CXR的可解释的深度学习系统，该系统可以将CXR扫描分类为多种胸部疾病，同时将大多数类型的关键发现本地定位在图像上。 Vindr-CXR接受了51,485 CXR扫描的培训，并通过放射科医生提供的边界盒注释进行了培训。它表现出与经验丰富的放射科医生相当的性能，可以在3,000 CXR扫描的回顾性验证集上对6种常见的胸部疾病进行分类，而在接收器操作特征曲线（AUROC）下的平均面积为0.967（95％置信区间[CI]：0.958-0.975）。 VINDR-CXR在独立患者队列中也得到了外部验证，并显示出其稳健性。对于具有14种类型病变的本地化任务，我们的自由响应接收器操作特征（FROC）分析表明，VINDR-CXR以每扫描确定的1.0假阳性病变的速率达到80.2％的灵敏度。还进行了一项前瞻性研究，以衡量VINDR-CXR在协助六位经验丰富的放射科医生方面的临床影响。结果表明，当用作诊断工具时，提出的系统显着改善了放射科医生本身之间的一致性，平均Fleiss的Kappa的同意增加了1.5％。我们还观察到，在放射科医生咨询了Vindr-CXR的建议之后，在平均Cohen的Kappa中，它们与系统之间的一致性显着增加了3.3％。

Recent artificial intelligence (AI) algorithms have achieved radiologist-level performance on various medical classification tasks. However, only a few studies addressed the localization of abnormal findings from CXR scans, which is essential in explaining the image-level classification to radiologists. We introduce in this paper an explainable deep learning system called VinDr-CXR that can classify a CXR scan into multiple thoracic diseases and, at the same time, localize most types of critical findings on the image. VinDr-CXR was trained on 51,485 CXR scans with radiologist-provided bounding box annotations. It demonstrated a comparable performance to experienced radiologists in classifying 6 common thoracic diseases on a retrospective validation set of 3,000 CXR scans, with a mean area under the receiver operating characteristic curve (AUROC) of 0.967 (95% confidence interval [CI]: 0.958-0.975). The VinDr-CXR was also externally validated in independent patient cohorts and showed its robustness. For the localization task with 14 types of lesions, our free-response receiver operating characteristic (FROC) analysis showed that the VinDr-CXR achieved a sensitivity of 80.2% at the rate of 1.0 false-positive lesion identified per scan. A prospective study was also conducted to measure the clinical impact of the VinDr-CXR in assisting six experienced radiologists. The results indicated that the proposed system, when used as a diagnosis supporting tool, significantly improved the agreement between radiologists themselves with an increase of 1.5% in mean Fleiss' Kappa. We also observed that, after the radiologists consulted VinDr-CXR's suggestions, the agreement between each of them and the system was remarkably increased by 3.3% in mean Cohen's Kappa.

下载PDF全文

下载文献需遵守相关版权规定

论文标题