论文标题
多语言Twitter语料库和基准,用于评估仇恨言论识别中的人口偏见
Multilingual Twitter Corpus and Baselines for Evaluating Demographic Bias in Hate Speech Recognition
论文作者
论文摘要
关于文档分类模型的公平评估的现有研究主要使用合成单语言数据,而没有地面真理的作者人群属性。在这项工作中,我们组装并发布了多种语言的Twitter语料库,以进行仇恨言论检测的任务,以推断出四个作者人口因素:年龄,国家,性别和种族/种族/种族。该语料库涵盖了五种语言:英语,意大利语,波兰语,葡萄牙语和西班牙语。我们使用众包平台评估了推断的人口标签,图8。为了检查可能导致偏见的因素,我们对英语语料库的人口统计学可预测性进行了经验分析。我们测量了四个流行文档分类器的性能,并评估了作者级人口属性上基线分类器的公平性和偏差。
Existing research on fairness evaluation of document classification models mainly uses synthetic monolingual data without ground truth for author demographic attributes. In this work, we assemble and publish a multilingual Twitter corpus for the task of hate speech detection with inferred four author demographic factors: age, country, gender and race/ethnicity. The corpus covers five languages: English, Italian, Polish, Portuguese and Spanish. We evaluate the inferred demographic labels with a crowdsourcing platform, Figure Eight. To examine factors that can cause biases, we take an empirical analysis of demographic predictability on the English corpus. We measure the performance of four popular document classifiers and evaluate the fairness and bias of the baseline classifiers on the author-level demographic attributes.