Garain在Semeval-2020任务12：基于序列的深度学习，用于对社交媒体中的进攻语言进行分类

论文标题

Garain在Semeval-2020任务12：基于序列的深度学习，用于对社交媒体中的进攻语言进行分类

Garain at SemEval-2020 Task 12: Sequence based Deep Learning for Categorizing Offensive Language in Social Media

论文作者

Garain, Avishek

论文摘要

Semeval-2020任务12是进攻：社交媒体中的多语言进攻语言识别（Zampieri等，2020）。该任务被细分为多种语言，并为每种语言提供了数据集。该任务进一步分为三个子任务：进攻性语言识别，犯罪类型的自动分类以及犯罪目标识别。我参加了任务-C，即进攻目标标识。为了准备提出的系统，我使用了诸如LSTMS和诸如Keras之类的深度学习网络，它们将单词模型与自动生成的基于序列的功能和从给定数据集中手动提取的功能相结合。我在整个数据集中有25％的培训系统的宏平均得分为47.763％。

SemEval-2020 Task 12 was OffenseEval: Multilingual Offensive Language Identification in Social Media (Zampieri et al., 2020). The task was subdivided into multiple languages and datasets were provided for each one. The task was further divided into three sub-tasks: offensive language identification, automatic categorization of offense types, and offense target identification. I have participated in the task-C, that is, offense target identification. For preparing the proposed system, I have made use of Deep Learning networks like LSTMs and frameworks like Keras which combine the bag of words model with automatically generated sequence based features and manually extracted features from the given dataset. My system on training on 25% of the whole dataset achieves macro averaged f1 score of 47.763%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题