论文标题

简单性偏见导致放大性能差异

Simplicity Bias Leads to Amplified Performance Disparities

论文作者

Bell, Samuel J., Sagun, Levent

论文摘要

给定模型的哪些部分很难?最近的工作表明,受SGD训练的模型对简单性有偏见,使他们优先考虑学习多数级别或依靠有害的虚假相关性。在这里,我们表明,对“轻松”的偏爱要深得多:模型可以优先考虑其发现简单的数据集或组的任何类别或组的组。当具有不同级别的复杂性与人口统计组保持一致的子集时,我们将这种困难差异称为这种现象,即使存在缺乏组/标签关联的平衡数据集也会发生。我们展示了一个依赖模型的数量的难度差异的难度,并在常用模型中进一步放大,作为典型的平均性能得分选择。我们量化了一系列设置的放大因子,以比较固定数据集上不同模型的差异。最后,我们提出了两个现实世界中难度放大的现实示例,即使使用平衡的数据集,各组之间的性能差异也比预期的。平衡数据集中的这种差异的存在表明,仅平衡组的样本量不足以确保无偏的性能。我们希望这项工作迈出了对模型偏差与数据结构相互作用的作用的可衡量理解的一步,并呼吁与数据集审核一起部署其他依赖模型的缓解方法。

Which parts of a dataset will a given model find difficult? Recent work has shown that SGD-trained models have a bias towards simplicity, leading them to prioritize learning a majority class, or to rely upon harmful spurious correlations. Here, we show that the preference for "easy" runs far deeper: A model may prioritize any class or group of the dataset that it finds simple-at the expense of what it finds complex-as measured by performance difference on the test set. When subsets with different levels of complexity align with demographic groups, we term this difficulty disparity, a phenomenon that occurs even with balanced datasets that lack group/label associations. We show how difficulty disparity is a model-dependent quantity, and is further amplified in commonly-used models as selected by typical average performance scores. We quantify an amplification factor across a range of settings in order to compare disparity of different models on a fixed dataset. Finally, we present two real-world examples of difficulty amplification in action, resulting in worse-than-expected performance disparities between groups even when using a balanced dataset. The existence of such disparities in balanced datasets demonstrates that merely balancing sample sizes of groups is not sufficient to ensure unbiased performance. We hope this work presents a step towards measurable understanding of the role of model bias as it interacts with the structure of data, and call for additional model-dependent mitigation methods to be deployed alongside dataset audits.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源