论文标题
将人们带回:竞争基准机器学习数据集
Bringing the People Back In: Contesting Benchmark Machine Learning Datasets
论文作者
论文摘要
为了响应社会技术系统中嵌入的算法不公平性,重点关注机器学习数据集的内容,这些数据集揭示了对白人,cisgender,cisgender,cisger,男性和西方数据主体的偏见。相反,对嵌入在此类数据集中的历史,价值和规范的关注相对较少。在这项工作中,我们概述了一个研究计划 - 机器学习数据的家谱 - 用于研究如何以及为什么创建这些数据集,什么价值以及谁的价值影响要收集的数据的选择,其创建的上下文和偶然性条件。我们描述了机器学习中的基准数据集作为基础架构运行的方式,并为这些数据集提出了四个研究问题。这种审讯迫使我们通过帮助我们了解数据集构建中嵌入的劳动,从而为其他遇到数据的研究人员提供新的竞赛途径,从而迫使我们“将人们重新加入”。
In response to algorithmic unfairness embedded in sociotechnical systems, significant attention has been focused on the contents of machine learning datasets which have revealed biases towards white, cisgender, male, and Western data subjects. In contrast, comparatively less attention has been paid to the histories, values, and norms embedded in such datasets. In this work, we outline a research program - a genealogy of machine learning data - for investigating how and why these datasets have been created, what and whose values influence the choices of data to collect, the contextual and contingent conditions of their creation. We describe the ways in which benchmark datasets in machine learning operate as infrastructure and pose four research questions for these datasets. This interrogation forces us to "bring the people back in" by aiding us in understanding the labor embedded in dataset construction, and thereby presenting new avenues of contestation for other researchers encountering the data.