伽利略（Semeval-2020）任务12：使用预训练的语言模型的进攻性语言识别的多语言学习

论文标题

伽利略（Semeval-2020）任务12：使用预训练的语言模型的进攻性语言识别的多语言学习

Galileo at SemEval-2020 Task 12: Multi-lingual Learning for Offensive Language Identification using Pre-trained Language Models

论文作者

Wang, Shuohuan, Liu, Jiaxiang, Ouyang, Xuan, Sun, Yu

论文摘要

本文介绍了伽利略在Semeval-2020任务12中的表现，以检测和对社交媒体中的进攻语言进行分类。对于进攻性语言识别，我们提出了一种使用预训练的语言模型Ernie和XLM-R的多语言方法。对于进攻性语言分类，我们提出了一种知识蒸馏方法，该方法是在多个监督模型生成的软标签上训练的。我们的团队参加了所有三个子任务。在子任务A-进攻性语言标识中，我们在所有语言中的平均F1分数方面排名第一。我们也是唯一在所有语言中排名前三名的团队。我们还在子任务B-犯罪类型和子任务C-进攻目标身份的自动分类中排名第一。

This paper describes Galileo's performance in SemEval-2020 Task 12 on detecting and categorizing offensive language in social media. For Offensive Language Identification, we proposed a multi-lingual method using Pre-trained Language Models, ERNIE and XLM-R. For offensive language categorization, we proposed a knowledge distillation method trained on soft labels generated by several supervised models. Our team participated in all three sub-tasks. In Sub-task A - Offensive Language Identification, we ranked first in terms of average F1 scores in all languages. We are also the only team which ranked among the top three across all languages. We also took the first place in Sub-task B - Automatic Categorization of Offense Types and Sub-task C - Offence Target Identification.

下载PDF全文

下载文献需遵守相关版权规定

论文标题