论文标题
医疗图像数据集上的内在偏差识别
Intrinsic Bias Identification on Medical Image Datasets
论文作者
论文摘要
基于机器学习的医学图像分析高度取决于数据集。数据集中的偏差可以通过模型来学习并降低应用程序的普遍性。有关于辩护模型的研究。但是,科学家和从业者很难确定数据集中的隐式偏见,这导致缺乏可靠的无验证测试数据集来有效模型。为了解决此问题,我们首先定义数据固有的偏差属性,然后为医疗图像数据集提出一个新颖的偏置识别框架。该框架包含两个主要组成部分,分别是Klotskinet和偏置判别方向分析(BDDA),其中klostkinet将构建映射,以区分正面和负样本,而BDDA为确定偏见属性提供了一种理论解决方案。三个数据集的实验结果显示了框架发现的偏差属性的有效性。
Machine learning based medical image analysis highly depends on datasets. Biases in the dataset can be learned by the model and degrade the generalizability of the applications. There are studies on debiased models. However, scientists and practitioners are difficult to identify implicit biases in the datasets, which causes lack of reliable unbias test datasets to valid models. To tackle this issue, we first define the data intrinsic bias attribute, and then propose a novel bias identification framework for medical image datasets. The framework contains two major components, KlotskiNet and Bias Discriminant Direction Analysis(bdda), where KlostkiNet is to build the mapping which makes backgrounds to distinguish positive and negative samples and bdda provides a theoretical solution on determining bias attributes. Experimental results on three datasets show the effectiveness of the bias attributes discovered by the framework.