使用计算语言模型在职位空缺中的上下文感知歧视检测

论文标题

使用计算语言模型在职位空缺中的上下文感知歧视检测

Context-Aware Discrimination Detection in Job Vacancies using Computational Language Models

论文作者

Vethman, S., Adhikari, A., de Boer, M. H. T., van Genabeek, J. A. G. M., Veenman, C. J.

论文摘要

歧视性的工作空缺在全球范围内被不赞成，但仍然坚持不懈。可以通过直接提及候选人的人口会员资格来明确职位空缺的歧视。还出现了可能并非总是非法但仍会影响申请人多样性的更隐含的歧视形式。正如荷兰最近观察到的那样，许多工作空缺中仍然存在明确的书面歧视。当前检测明确歧视的努力涉及识别包含潜在歧视术语的工作空缺，例如“年轻”或“男性”。但是，由于精确度较低，自动检测效率低下：例如“我们是一家年轻的公司”或“与男性患者一起工作”是包含明确术语的短语，而上下文表明这些词不反映歧视性内容。在本文中，我们通过识别何时在歧视性环境中使用何时使用何时使用潜在的歧视术语来提高基于机器学习的计算语言模型如何在检测明确歧视的情况下提高精度。我们专注于性别歧视，在过滤明确的术语时，确实遭受了较低的精度。首先，我们创建了一个用于职位空缺中性别歧视的数据集。其次，我们研究了各种计算语言模型，以进行歧视性环境检测。第三，我们评估了这些模型在上下文中检测不可预见的歧视术语的能力。结果表明，基于机器学习的方法可以高精度检测明确的性别歧视，并有助于寻找新的歧视形式。因此，所提出的方法可以大大提高检测高度怀疑具有歧视性的工作空缺的有效性。反过来，这可能会降低招聘过程开始时经历的歧视。

Discriminatory job vacancies are disapproved worldwide, but remain persistent. Discrimination in job vacancies can be explicit by directly referring to demographic memberships of candidates. More implicit forms of discrimination are also present that may not always be illegal but still influence the diversity of applicants. Explicit written discrimination is still present in numerous job vacancies, as was recently observed in the Netherlands. Current efforts for the detection of explicit discrimination concern the identification of job vacancies containing potentially discriminating terms such as "young" or "male". However, automatic detection is inefficient due to low precision: e.g. "we are a young company" or "working with mostly male patients" are phrases that contain explicit terms, while the context shows that these do not reflect discriminatory content. In this paper, we show how machine learning based computational language models can raise precision in the detection of explicit discrimination by identifying when the potentially discriminating terms are used in a discriminatory context. We focus on gender discrimination, which indeed suffers from low precision when filtering explicit terms. First, we created a data set for gender discrimination in job vacancies. Second, we investigated a variety of computational language models for discriminatory context detection. Third, we evaluated the capability of these models to detect unforeseen discriminating terms in context. The results show that machine learning based methods can detect explicit gender discrimination with high precision and help in finding new forms of discrimination. Accordingly, the proposed methods can substantially increase the effectiveness of detecting job vacancies which are highly suspected to be discriminatory. In turn, this may lower the discrimination experienced at the start of the recruitment process.

下载PDF全文

下载文献需遵守相关版权规定

论文标题