Fairdistiltation：减轻语言模型中的刻板印象

论文标题

Fairdistiltation：减轻语言模型中的刻板印象

FairDistillation: Mitigating Stereotyping in Language Models

论文作者

Delobelle, Pieter, Berendt, Bettina

论文摘要

大型的预训练的语言模型成功地用于多种语言的各种任务中。随着这种不断增加的使用，有害副作用的风险也会上升，例如，通过重现和加强刻板印象。但是，在解决多种语言或考虑不同的偏见时，发现和缓解这些危害通常很难做到，并且在计算上变得昂贵。为了解决这个问题，我们提出了Fairdistiltation：一种基于知识蒸馏的跨语性方法，以构建较小的语言模型，同时控制特定的偏见。我们发现，我们的蒸馏方法不会对大多数任务的下游性能产生负面影响，并成功减轻刻板印象和代表性危害。我们证明，与替代方法相比，Fairdistillation可以以低得多的成本创建更公平的语言模型。

Large pre-trained language models are successfully being used in a variety of tasks, across many languages. With this ever-increasing usage, the risk of harmful side effects also rises, for example by reproducing and reinforcing stereotypes. However, detecting and mitigating these harms is difficult to do in general and becomes computationally expensive when tackling multiple languages or when considering different biases. To address this, we present FairDistillation: a cross-lingual method based on knowledge distillation to construct smaller language models while controlling for specific biases. We found that our distillation method does not negatively affect the downstream performance on most tasks and successfully mitigates stereotyping and representational harms. We demonstrate that FairDistillation can create fairer language models at a considerably lower cost than alternative approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题