论文标题
阿拉伯仇恨言论2022的Alexu-AIC:与分类形成对比
AlexU-AIC at Arabic Hate Speech 2022: Contrast to Classify
论文作者
论文摘要
Facebook和Twitter等社交媒体平台上的在线形象已成为互联网用户的日常习惯。尽管平台为用户提供了大量服务,但用户仍遭受了网络欺凌的困扰,这进一步导致了精神虐待,并可能升级会对个人或目标群体造成身体伤害。在本文中,我们使用相关的阿拉伯语Twitter数据集将其提交给阿拉伯仇恨言论2022共享任务研讨会(OSACT5 2022)。共享任务由3个子任务组成,子任务A的重点是检测该推文是否令人反感。然后,对于进攻性推文,子任务B专注于检测该推文是否是仇恨言论。最后,对于仇恨言论推文,子任务C的重点是检测六个不同类别中的细粒度类型的仇恨言论。变压器模型证明了它们在分类任务方面的效率,但是在小型或不平衡数据集中进行微调时的问题问题。我们通过研究多个培训范式(例如对比度学习和多任务学习以及分类微调以及我们前5名表演者的合奏)来克服这一局限性。我们提出的解决方案分别在子任务A,B和C中分别实现了0.841、0.817和0.476宏F1平均。
Online presence on social media platforms such as Facebook and Twitter has become a daily habit for internet users. Despite the vast amount of services the platforms offer for their users, users suffer from cyber-bullying, which further leads to mental abuse and may escalate to cause physical harm to individuals or targeted groups. In this paper, we present our submission to the Arabic Hate Speech 2022 Shared Task Workshop (OSACT5 2022) using the associated Arabic Twitter dataset. The shared task consists of 3 sub-tasks, sub-task A focuses on detecting whether the tweet is offensive or not. Then, For offensive Tweets, sub-task B focuses on detecting whether the tweet is hate speech or not. Finally, For hate speech Tweets, sub-task C focuses on detecting the fine-grained type of hate speech among six different classes. Transformer models proved their efficiency in classification tasks, but with the problem of over-fitting when fine-tuned on a small or an imbalanced dataset. We overcome this limitation by investigating multiple training paradigms such as Contrastive learning and Multi-task learning along with Classification fine-tuning and an ensemble of our top 5 performers. Our proposed solution achieved 0.841, 0.817, and 0.476 macro F1-average in sub-tasks A, B, and C respectively.