论文标题

基于熵的注意正规化使列表中的意外偏置缓解措施

Entropy-based Attention Regularization Frees Unintended Bias Mitigation from Lists

论文作者

Attanasio, Giuseppe, Nozza, Debora, Hovy, Dirk, Baralis, Elena

论文摘要

自然语言处理(NLP)模型有可能过度适合培训数据中的特定术语,从而降低了其性能,公平性和普遍性。例如,神经仇恨言论检测模型受同性恋或妇女等身份术语的强烈影响,导致误报,严重的意外偏见和较低的表现。大多数缓解技术在训练过程中使用来自目标域的身份术语或样本的列表。但是,这种方法需要A-Priori知识,如果忽略了重要的术语,则会引入进一步的偏见。取而代之的是,我们提出了一个基于无知识的熵注意(EAR),以不适合特定培训的术语。额外的目标函数会以低自我发项熵的范围惩罚令牌。我们通过耳朵微调伯特:由此产生的模型匹配或超过了仇恨言语分类和三个基准语料库的偏见指标的最先进的性能,英语和意大利语。 EAR还揭示了过度拟合的术语,即最有可能引起偏见的术语,以帮助确定其对模型,任务和预测的影响。

Natural Language Processing (NLP) models risk overfitting to specific terms in the training data, thereby reducing their performance, fairness, and generalizability. E.g., neural hate speech detection models are strongly influenced by identity terms like gay, or women, resulting in false positives, severe unintended bias, and lower performance. Most mitigation techniques use lists of identity terms or samples from the target domain during training. However, this approach requires a-priori knowledge and introduces further bias if important terms are neglected. Instead, we propose a knowledge-free Entropy-based Attention Regularization (EAR) to discourage overfitting to training-specific terms. An additional objective function penalizes tokens with low self-attention entropy. We fine-tune BERT via EAR: the resulting model matches or exceeds state-of-the-art performance for hate speech classification and bias metrics on three benchmark corpora in English and Italian. EAR also reveals overfitting terms, i.e., terms most likely to induce bias, to help identify their effect on the model, task, and predictions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源