在机器学习中进行交叉性：包括更多的身份，处理代表性不足和进行评估

论文标题

在机器学习中进行交叉性：包括更多的身份，处理代表性不足和进行评估

Towards Intersectionality in Machine Learning: Including More Identities, Handling Underrepresentation, and Performing Evaluation

论文作者

Wang, Angelina, Ramaswamy, Vikram V., Russakovsky, Olga

论文摘要

从历史上看，机器学习公平的研究一直认为是单个二元人口属性。但是，现实当然要复杂得多。在这项工作中，我们努力应对机器学习管道的三个阶段的问题，将相交性合并为多个人口统计属性：（1）哪些人群属性包含为数据集标签，（2）如何处理模型培训期间逐渐较小的子组尺寸逐渐较小的尺寸，以及（3）如何超越现有的评估度量，以使现有模型超越型号，以使模型更加稳定范围。对于每个问题，我们都会对来自美国人口普查的表格数据集进行彻底的经验评估，并为机器学习社区提供建设性的建议。首先，我们倡导在选择要训练的人口统计属性标签时，以经验验证补充领域知识，同时始终评估整个人口统计学属性。其次，我们警告不要使用数据不平衡技术而不考虑其规范含义，并建议使用数据中的结构进行替代方案。第三，我们介绍了新的评估指标，这些指标更适合相交设置。总体而言，当将交叉路口纳入机器学习时，我们就三个必要的考虑（尽管不够！）提供了实质性建议。

Research in machine learning fairness has historically considered a single binary demographic attribute; however, the reality is of course far more complicated. In this work, we grapple with questions that arise along three stages of the machine learning pipeline when incorporating intersectionality as multiple demographic attributes: (1) which demographic attributes to include as dataset labels, (2) how to handle the progressively smaller size of subgroups during model training, and (3) how to move beyond existing evaluation metrics when benchmarking model fairness for more subgroups. For each question, we provide thorough empirical evaluation on tabular datasets derived from the US Census, and present constructive recommendations for the machine learning community. First, we advocate for supplementing domain knowledge with empirical validation when choosing which demographic attribute labels to train on, while always evaluating on the full set of demographic attributes. Second, we warn against using data imbalance techniques without considering their normative implications and suggest an alternative using the structure in the data. Third, we introduce new evaluation metrics which are more appropriate for the intersectional setting. Overall, we provide substantive suggestions on three necessary (albeit not sufficient!) considerations when incorporating intersectionality into machine learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题