论文标题
“我们无法衡量,我们无法理解”:追求公平时人口统计数据采购的挑战
"What We Can't Measure, We Can't Understand": Challenges to Demographic Data Procurement in the Pursuit of Fairness
论文作者
论文摘要
随着对公平和公正的算法系统的要求增加,从事行业算法公平的个人数量也增加了。但是,这些从业者通常无法访问他们认为在实践中发现偏见所需的人口统计数据。即使越来越多的工具包和策略旨在努力实现算法公平性,它们几乎总是需要访问人口属性或代理。我们通过对38位从业者和专业人士进行算法公平或邻近的专业人士的半结构化访谈来调查这一困境。参与者描绘了地面上的人口统计数据可用性和外观的复杂图片,从无法访问任何形式的个人数据到合法需要收集和使用人口统计数据进行歧视评估。在许多领域中,人口数据收集提出了许多困难的问题,包括如何平衡隐私和公平,如何定义相关的社会类别,如何确保有意义的同意以及私人公司推断某人的人口统计信息是否适合。我们的研究表明,企业,监管机构,研究人员和社区团体必须考虑的挑战,以便使从业人员能够解决实践中的算法偏见。至关重要的是,我们不建议将来工作的总体目标是简单地降低收集人口统计数据的障碍。相反,我们的研究表现出有关如何,何时以及是否应该采购这些数据的规范性问题,并且在不采用的情况下,仍应采取什么措施来减轻偏见。
As calls for fair and unbiased algorithmic systems increase, so too does the number of individuals working on algorithmic fairness in industry. However, these practitioners often do not have access to the demographic data they feel they need to detect bias in practice. Even with the growing variety of toolkits and strategies for working towards algorithmic fairness, they almost invariably require access to demographic attributes or proxies. We investigated this dilemma through semi-structured interviews with 38 practitioners and professionals either working in or adjacent to algorithmic fairness. Participants painted a complex picture of what demographic data availability and use look like on the ground, ranging from not having access to personal data of any kind to being legally required to collect and use demographic data for discrimination assessments. In many domains, demographic data collection raises a host of difficult questions, including how to balance privacy and fairness, how to define relevant social categories, how to ensure meaningful consent, and whether it is appropriate for private companies to infer someone's demographics. Our research suggests challenges that must be considered by businesses, regulators, researchers, and community groups in order to enable practitioners to address algorithmic bias in practice. Critically, we do not propose that the overall goal of future work should be to simply lower the barriers to collecting demographic data. Rather, our study surfaces a swath of normative questions about how, when, and whether this data should be procured, and, in cases where it is not, what should still be done to mitigate bias.