论文标题
开放放射线学:标准化数据集的集合和可再现的放射线机器学习管道的技术协议
Open-radiomics: A Collection of Standardized Datasets and a Technical Protocol for Reproducible Radiomics Machine Learning Pipelines
论文作者
论文摘要
背景:作为医学成像中机器学习管道的重要分支,放射线学面临两个主要挑战,即可重复性和可访问性。在这项工作中,我们介绍了基于我们提出的技术协议,以提高结果的可重复性,介绍了一组放射组学数据集,以及全面的放射线学管道。方法:我们基于三个开源数据集策划了大规模放射线数据集; Brats 2020用于高级神经胶质瘤(HGG)与低级神经胶质瘤(LGG)分类和生存分析,BRATS 2023用于O6-甲基鸟嘌呤-DNA甲基转移酶分类,以及来自癌症成像存档的非小细胞肺癌存活分析。使用Brats 2020磁共振成像(MRI)数据集,我们将方案应用于369名脑肿瘤患者(76 LGG,293 HGG)。利用吡啶组对LGG与HGG分类,我们从4个MRI序列,3个Binwidth,6种标准化方法和4个肿瘤子区域产生了288个数据集。在100个不同的数据拆分(28,800个测试结果)中对随机森林分类器进行了训练和验证(60%,20%,20%),评估接收器操作特征曲线(AUROC)下的面积。结果:与缩写和图像归一化不同,肿瘤子区域和成像序列显着影响模型的性能。 T1对比增强的序列以及坏死和非增强肿瘤核心子区域的结合导致了最高的AUROC(平均测试AUROC 0.951,95%的置信区间(0.949,0.952))。尽管几种设置和数据拆分(28800分中的28个)的测试AUROC为1,但它们是不可重复的。结论:我们的实验证明了放射线管道的可变性来源(例如肿瘤下区域)可能会对结果产生重大影响,这可能会导致浅表完美的表现,而表面是不可培养的。
Background: As an important branch of machine learning pipelines in medical imaging, radiomics faces two major challenges namely reproducibility and accessibility. In this work, we introduce open-radiomics, a set of radiomics datasets along with a comprehensive radiomics pipeline based on our proposed technical protocol to improve the reproducibility of the results. Methods: We curated large-scale radiomics datasets based on three open-source datasets; BraTS 2020 for high-grade glioma (HGG) versus low-grade glioma (LGG) classification and survival analysis, BraTS 2023 for O6-methylguanine-DNA methyltransferase classification, and non-small cell lung cancer survival analysis from the Cancer Imaging Archive. Using BraTS 2020 Magnetic Resonance Imaging (MRI) dataset, we applied our protocol to 369 brain tumor patients (76 LGG, 293 HGG). Leveraging PyRadiomics for LGG vs. HGG classification, we generated 288 datasets from 4 MRI sequences, 3 binWidths, 6 normalization methods, and 4 tumor subregions. Random Forest classifiers were trained and validated (60%,20%,20%) across 100 different data splits (28,800 test results), evaluating Area Under the Receiver Operating Characteristic Curve (AUROC). Results: Unlike binWidth and image normalization, tumor subregion and imaging sequence significantly affected performance of the models. T1 contrast-enhanced sequence and the union of Necrotic and the non-enhancing tumor core subregions resulted in the highest AUROCs (average test AUROC 0.951, 95% confidence interval of (0.949, 0.952)). Although several settings and data splits (28 out of 28800) yielded test AUROC of 1, they were irreproducible. Conclusion: Our experiments demonstrate the sources of variability in radiomics pipelines (e.g., tumor subregion) can have a significant impact on the results, which may lead to superficial perfect performances that are irreproducible.