偏见模仿：一种简单的缓解偏差抽样方法

论文标题

偏见模仿：一种简单的缓解偏差抽样方法

Bias Mimicking: A Simple Sampling Approach for Bias Mitigation

论文作者

Qraitem, Maan, Saenko, Kate, Plummer, Bryan A.

论文摘要

先前的工作表明，视觉识别数据集经常在类标签$ y $（\ eg程序员）中经常占余量偏差组$ b $（\ eg女性）。该数据集偏见可能会导致模型，这些模型可以学习类标签和偏见组（例如年龄，性别或种族）之间的虚假相关性。解决此问题的最新方法需要重大的体系结构变化或其他损失功能，需要进行更多的参数调整。另外，类不平衡文献中的数据采样基准（例如，采样，上行重量）通常可以用一行代码来实现，并且通常没有超参数，提供更便宜，更有效的解决方案。但是，这些方法遭受了重大缺点。例如，底样采样下降了每个时期输入分布的很大一部分，而对重复样品进行了过采样，从而导致过度拟合。为了解决这些缺点，我们介绍了一种新的班级条件采样方法：模仿偏差。该方法基于这样的观察结果：如果每一个$ c^{\ prime} \ neq c $模仿$ c $偏差分布\ ie $ p_d（b | y = c）$，则$ y $和$ b $在统计上是独立的。使用这个概念，BM通过一种新颖的训练程序确保模型在无重复样品的情况下暴露于每个时期的整个分布。因此，模仿偏见会在四个基准测试中提高了代表性不足的小组的抽样方法的准确性，同时维持了非缩采样方法的性能，有时还提高了性能。代码：\ url {https://github.com/mqraitem/bias-mimicking}

Prior work has shown that Visual Recognition datasets frequently underrepresent bias groups $B$ (\eg Female) within class labels $Y$ (\eg Programmers). This dataset bias can lead to models that learn spurious correlations between class labels and bias groups such as age, gender, or race. Most recent methods that address this problem require significant architectural changes or additional loss functions requiring more hyper-parameter tuning. Alternatively, data sampling baselines from the class imbalance literature (\eg Undersampling, Upweighting), which can often be implemented in a single line of code and often have no hyperparameters, offer a cheaper and more efficient solution. However, these methods suffer from significant shortcomings. For example, Undersampling drops a significant part of the input distribution per epoch while Oversampling repeats samples, causing overfitting. To address these shortcomings, we introduce a new class-conditioned sampling method: Bias Mimicking. The method is based on the observation that if a class $c$ bias distribution, \ie $P_D(B|Y=c)$ is mimicked across every $c^{\prime}\neq c$, then $Y$ and $B$ are statistically independent. Using this notion, BM, through a novel training procedure, ensures that the model is exposed to the entire distribution per epoch without repeating samples. Consequently, Bias Mimicking improves underrepresented groups' accuracy of sampling methods by 3\% over four benchmarks while maintaining and sometimes improving performance over nonsampling methods. Code: \url{https://github.com/mqraitem/Bias-Mimicking}

下载PDF全文

下载文献需遵守相关版权规定

论文标题