改善胸部X射线分类器的公平性

论文标题

改善胸部X射线分类器的公平性

Improving the Fairness of Chest X-ray Classifiers

论文作者

Zhang, Haoran, Dullerud, Natalie, Roth, Karsten, Oakden-Rayner, Lauren, Pfohl, Stephen Robert, Ghassemi, Marzyeh

论文摘要

深度学习模型已经达到或超过了医学成像领域的人类水平的表现，尤其是在使用胸部X射线诊断的疾病诊断中。但是，先前的工作发现，这种分类器可以以跨受保护组的预测性能表现出偏见。在本文中，我们质疑在预测性能（即集体公平）方面努力实现零差异是在临床环境中的适当公平定义，而不是最小值公平，这重点是最大程度地提高最差案例群体的表现。我们基于在这两个定义中改善分类器公平性的九种方法的性能。我们发现，与先前在非临床数据上的工作一致的方法是，努力实现更好的最差绩效的方法不会超越简单的数据平衡。我们还发现，实现群体公平性的方法通过使所有群体的绩效恶化来做到这一点。鉴于这些结果，我们在临床环境中讨论了公平定义的实用性，主张在可能的情况下研究基础数据生成过程中偏置诱导机制。

Deep learning models have reached or surpassed human-level performance in the field of medical imaging, especially in disease diagnosis using chest x-rays. However, prior work has found that such classifiers can exhibit biases in the form of gaps in predictive performance across protected groups. In this paper, we question whether striving to achieve zero disparities in predictive performance (i.e. group fairness) is the appropriate fairness definition in the clinical setting, over minimax fairness, which focuses on maximizing the performance of the worst-case group. We benchmark the performance of nine methods in improving classifier fairness across these two definitions. We find, consistent with prior work on non-clinical data, that methods which strive to achieve better worst-group performance do not outperform simple data balancing. We also find that methods which achieve group fairness do so by worsening performance for all groups. In light of these results, we discuss the utility of fairness definitions in the clinical setting, advocating for an investigation of the bias-inducing mechanisms in the underlying data generating process whenever possible.

下载PDF全文

下载文献需遵守相关版权规定

论文标题