评估显着图的（联合国）可信度，以在医学成像中定位异常

论文标题

评估显着图的（联合国）可信度，以在医学成像中定位异常

Assessing the (Un)Trustworthiness of Saliency Maps for Localizing Abnormalities in Medical Imaging

论文作者

Arun, Nishanth, Gaw, Nathan, Singh, Praveer, Chang, Ken, Aggarwal, Mehak, Chen, Bryan, Hoebel, Katharina, Gupta, Sharut, Patel, Jay, Gidwani, Mishka, Adebayo, Julius, Li, Matthew D., Kalpathy-Cramer, Jayashree

论文摘要

显着性图已成为一种广泛使用的方法，可以通过识别输入医学图像的最相关领域来提供分类器后的事后解释，从而使深度学习模型更加解释。它们越来越多地用于医学成像中，为神经网络做出的决策提供了临床上合理的解释。但是，这些可视化图的效用和鲁棒性尚未在医学成像的背景下进行严格检查。我们认为，在这种情况下的可信度需要1）定位实用程序，2）对模型重量随机化的敏感性，3）可重复性和4）可重复性。使用两个大型公共放射学数据集中可用的本地化信息，我们使用Precision-Recall曲线（AUPRC）和结构相似性指数（SSIM）（SSIM）量化了上述标准的八种常用显着性图方法的性能，并将其性能与各种基线度量进行了比较。使用我们的框架来量化显着性图的可信度，我们表明所有八个显着性图技术至少在其中一个标准中失败了，并且在大多数情况下，与基线相比，值得信赖。我们建议它们在医学成像的高风险领域的使用需要进行额外的审查，并建议如果定位是网络的所需输出，则使用检测或分割模型。此外，为了促进我们的发现的可重复性，我们提供了用于此工作中所有测试的代码：https：//github.com/qtim-lab/assessing-sality-maps。

Saliency maps have become a widely used method to make deep learning models more interpretable by providing post-hoc explanations of classifiers through identification of the most pertinent areas of the input medical image. They are increasingly being used in medical imaging to provide clinically plausible explanations for the decisions the neural network makes. However, the utility and robustness of these visualization maps has not yet been rigorously examined in the context of medical imaging. We posit that trustworthiness in this context requires 1) localization utility, 2) sensitivity to model weight randomization, 3) repeatability, and 4) reproducibility. Using the localization information available in two large public radiology datasets, we quantify the performance of eight commonly used saliency map approaches for the above criteria using area under the precision-recall curves (AUPRC) and structural similarity index (SSIM), comparing their performance to various baseline measures. Using our framework to quantify the trustworthiness of saliency maps, we show that all eight saliency map techniques fail at least one of the criteria and are, in most cases, less trustworthy when compared to the baselines. We suggest that their usage in the high-risk domain of medical imaging warrants additional scrutiny and recommend that detection or segmentation models be used if localization is the desired output of the network. Additionally, to promote reproducibility of our findings, we provide the code we used for all tests performed in this work at this link: https://github.com/QTIM-Lab/Assessing-Saliency-Maps.

下载PDF全文

下载文献需遵守相关版权规定

论文标题