论文标题
Hatebert:在英语中为滥用语言检测的BERT再培训
HateBERT: Retraining BERT for Abusive Language Detection in English
论文作者
论文摘要
在本文中,我们介绍了Hatebert,这是一种重新训练的BERT模型,用于英语中的滥用语言检测。该模型接受了RAL-E的培训,RAL-E是一份大规模的Reddit评论,该数据集的英语评论是因为我们已经收集并提供了公众的进攻,虐待或仇恨而被禁止的社区。我们介绍了一般预训练的语言模型与通过在三个英语数据集中被禁止社区的帖子获得的帖子获得的一般训练的语言模型与滥用词的详细比较的结果,以进行进攻,虐待语言和仇恨言论检测任务。在所有数据集中,Hatebert的表现都优于相应的一般BERT模型。我们还讨论了一系列实验,以比较通用预训练的语言模型的可移植性及其相应的滥用语言跨数据集的对应物,这表明可移植性受带注释现象的兼容性的影响。
In this paper, we introduce HateBERT, a re-trained BERT model for abusive language detection in English. The model was trained on RAL-E, a large-scale dataset of Reddit comments in English from communities banned for being offensive, abusive, or hateful that we have collected and made available to the public. We present the results of a detailed comparison between a general pre-trained language model and the abuse-inclined version obtained by retraining with posts from the banned communities on three English datasets for offensive, abusive language and hate speech detection tasks. In all datasets, HateBERT outperforms the corresponding general BERT model. We also discuss a battery of experiments comparing the portability of the generic pre-trained language model and its corresponding abusive language-inclined counterpart across the datasets, indicating that portability is affected by compatibility of the annotated phenomena.