论文标题
Rood-MRI:基准测试深度学习分割模型的鲁棒性,以分布和MRI中的损坏数据
ROOD-MRI: Benchmarking the robustness of deep learning segmentation models to out-of-distribution and corrupted data in MRI
论文作者
论文摘要
深层人工神经网络(DNN)由于其在分类,细分和检测挑战方面的成功而移至医学图像分析的最前沿。在神经图像分析中大规模部署DNN的主要挑战是由于扫描仪和劫持方案中的方差而导致的信噪比,对比度,分辨率,分辨率,分辨率以及从站点到现场的伪像的存在的潜力。 DNN在计算机视觉中的这些分配变化非常容易受到攻击。当前,尚无基准测量平台或框架来评估新模型和MRI的特定分配变化的鲁棒性,并且可访问的多站点基准数据集仍然稀缺或特定于任务。为了解决这些局限性,我们提出了Rood-MRI:一个基准测试DNN稳健性(OOD)数据,腐败和MRI中文物的稳健性的平台。该平台提供了使用模型分布在MRI中的模型变化,新得出的基准测试指标进行图像分割的变换的转换以及使用新模型和任务使用方法的示例的模块。在几项大型研究中,我们将方法应用于海马,心室和白质高强度分割,提供海马数据集作为公开可用的基准。通过评估这些数据集上的现代DNN,我们证明它们非常容易受到MRI分配变化和腐败的影响。我们表明,尽管数据增强策略可以大大提高对解剖学分割任务的OOD数据的鲁棒性,但使用增强的现代DNN在更具挑战性的基于病变的细分任务中仍然缺乏鲁棒性。我们最终基于U-NET和基于变压器的模型,发现了跨体系结构的特定类别变换类别的鲁棒性差异。
Deep artificial neural networks (DNNs) have moved to the forefront of medical image analysis due to their success in classification, segmentation, and detection challenges. A principal challenge in large-scale deployment of DNNs in neuroimage analysis is the potential for shifts in signal-to-noise ratio, contrast, resolution, and presence of artifacts from site to site due to variances in scanners and acquisition protocols. DNNs are famously susceptible to these distribution shifts in computer vision. Currently, there are no benchmarking platforms or frameworks to assess the robustness of new and existing models to specific distribution shifts in MRI, and accessible multi-site benchmarking datasets are still scarce or task-specific. To address these limitations, we propose ROOD-MRI: a platform for benchmarking the Robustness of DNNs to Out-Of-Distribution (OOD) data, corruptions, and artifacts in MRI. The platform provides modules for generating benchmarking datasets using transforms that model distribution shifts in MRI, implementations of newly derived benchmarking metrics for image segmentation, and examples for using the methodology with new models and tasks. We apply our methodology to hippocampus, ventricle, and white matter hyperintensity segmentation in several large studies, providing the hippocampus dataset as a publicly available benchmark. By evaluating modern DNNs on these datasets, we demonstrate that they are highly susceptible to distribution shifts and corruptions in MRI. We show that while data augmentation strategies can substantially improve robustness to OOD data for anatomical segmentation tasks, modern DNNs using augmentation still lack robustness in more challenging lesion-based segmentation tasks. We finally benchmark U-Nets and transformer-based models, finding consistent differences in robustness to particular classes of transforms across architectures.