论文标题

使用Shapley值进行医学成像的数据评估:在大型胸部X射线数据集上应用

Data Valuation for Medical Imaging Using Shapley Value: Application on A Large-scale Chest X-ray Dataset

论文作者

Tang, Siyi, Ghorbani, Amirata, Yamashita, Rikiya, Rehman, Sameer, Dunnmon, Jared A., Zou, James, Rubin, Daniel L.

论文摘要

在对低质量数据进行培训时,机器学习模型的可靠性可能会受到损害。许多大规模的医学成像数据集都包含从医疗报告等来源提取的低质量标签。此外,由于设备或测量错误引起的伪像和偏见,数据集中的图像可能具有异质质量。因此,高度需要可以自动识别低质量数据的算法。在这项研究中,我们使用数据评估度量标准Shapley来量化训练数据的价值对大型胸部X射线数据集中的肺炎检测算法的性能。我们表征了数据沙普利在识别低质量和有价值的肺炎检测数据方面的有效性。我们发现,删除具有高沙普利值的训练数据会降低肺炎检测性能,而删除低沙普利值的数据改善了模型性能。此外,在低沙​​普利价值数据中,还有更多标记的例子,而在高沙普利价值数据中,还有更多真实的肺炎病例。我们的结果表明,低沙普利值表明标签错误或质量差的图像,而高沙普利值表示对肺炎检测有价值的数据。我们的方法可以用作使用数据沙普利来denoise大规模医学成像数据集的框架。

The reliability of machine learning models can be compromised when trained on low quality data. Many large-scale medical imaging datasets contain low quality labels extracted from sources such as medical reports. Moreover, images within a dataset may have heterogeneous quality due to artifacts and biases arising from equipment or measurement errors. Therefore, algorithms that can automatically identify low quality data are highly desired. In this study, we used data Shapley, a data valuation metric, to quantify the value of training data to the performance of a pneumonia detection algorithm in a large chest X-ray dataset. We characterized the effectiveness of data Shapley in identifying low quality versus valuable data for pneumonia detection. We found that removing training data with high Shapley values decreased the pneumonia detection performance, whereas removing data with low Shapley values improved the model performance. Furthermore, there were more mislabeled examples in low Shapley value data and more true pneumonia cases in high Shapley value data. Our results suggest that low Shapley value indicates mislabeled or poor quality images, whereas high Shapley value indicates data that are valuable for pneumonia detection. Our method can serve as a framework for using data Shapley to denoise large-scale medical imaging datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源