论文标题
在乌尔都语中,乌尔都语的滥用和威胁性语言检测概述2021
Overview of Abusive and Threatening Language Detection in Urdu at FIRE 2021
论文作者
论文摘要
随着社交媒体平台影响的增长,滥用的影响变得越来越有影响力。自动检测威胁和滥用语言的重要性不能高估。但是,大多数现有的研究和最先进的方法都以英语为目标语言,对低资产品语言的工作有限。在本文中,我们介绍了乌尔都语的两项虐待和威胁性语言检测的共同任务,乌尔都语在全球范围内拥有超过1.7亿个讲话者。两者都被认为是二进制分类任务,其中需要参与系统将乌尔都语中的推文分类为两个类别,即:(i)第一个任务的滥用和非滥用,以及(ii)第二次威胁和不威胁。我们提供两个手动注释的数据集,其中包含标有(i)滥用和非虐待的推文,以及(ii)威胁和无威胁。滥用数据集在火车部分中包含2400个注释的推文,测试部分中包含1100个注释的推文。威胁数据集在火车部分中包含6000个注释的推文,测试部分中包含3950个注释的推文。我们还为这两个任务提供了逻辑回归和基于BERT的基线分类器。在这项共同的任务中,来自六个国家的21个团队注册参加了参与(印度,巴基斯坦,中国,马来西亚,阿拉伯联合酋长国和台湾),有10个团队提交了他们的子任务A奔跑,这是虐待性语言检测,9个团队提交了子任务B的奔跑,以进行子任务B,这是威胁性的语言检测,并提交了他们的技术报告。最佳性能系统达到子任务A的F1得分值为0.880,子任务为0.545。对于两个子任务,基于M-Bert的变压器模型均表现出最佳性能。
With the growth of social media platform influence, the effect of their misuse becomes more and more impactful. The importance of automatic detection of threatening and abusive language can not be overestimated. However, most of the existing studies and state-of-the-art methods focus on English as the target language, with limited work on low- and medium-resource languages. In this paper, we present two shared tasks of abusive and threatening language detection for the Urdu language which has more than 170 million speakers worldwide. Both are posed as binary classification tasks where participating systems are required to classify tweets in Urdu into two classes, namely: (i) Abusive and Non-Abusive for the first task, and (ii) Threatening and Non-Threatening for the second. We present two manually annotated datasets containing tweets labelled as (i) Abusive and Non-Abusive, and (ii) Threatening and Non-Threatening. The abusive dataset contains 2400 annotated tweets in the train part and 1100 annotated tweets in the test part. The threatening dataset contains 6000 annotated tweets in the train part and 3950 annotated tweets in the test part. We also provide logistic regression and BERT-based baseline classifiers for both tasks. In this shared task, 21 teams from six countries registered for participation (India, Pakistan, China, Malaysia, United Arab Emirates, and Taiwan), 10 teams submitted their runs for Subtask A, which is Abusive Language Detection and 9 teams submitted their runs for Subtask B, which is Threatening Language detection, and seven teams submitted their technical reports. The best performing system achieved an F1-score value of 0.880 for Subtask A and 0.545 for Subtask B. For both subtasks, m-Bert based transformer model showed the best performance.