论文标题
在Semeval-2022任务4上的Beike NLP:基于及时的段落分类,用于光顾和屈服于语言检测
BEIKE NLP at SemEval-2022 Task 4: Prompt-Based Paragraph Classification for Patronizing and Condescending Language Detection
论文作者
论文摘要
PCL检测任务旨在识别和分类语言,这些语言是对一般媒体中脆弱社区的尊重或屈服于段落分类的其他NLP任务,PCL检测任务中呈现的负面语言通常是更隐含的和微妙的,因此可以识别出识别的,从而使公共文本分类的表现使人失望。针对Semeval-2022 Task 4中的PCL检测问题4,在本文中,我们对团队的解决方案进行了介绍,该解决方案利用了对段落分类的迅速学习的力量。我们将任务重新制定为适当的披肩提示,并使用预先训练的蒙版语言模型来填补披肩插槽。对于两个子任务,即二进制分类和多标签分类,采用并微调Deberta模型来预测特定于任务的提示的蒙版标签单词。在评估数据集中,对于二进制分类,我们的方法达到了0.6406的F1得分;对于多标签分类,我们的方法达到了0.4689的宏F1分数,在排行榜中排名第一。
PCL detection task is aimed at identifying and categorizing language that is patronizing or condescending towards vulnerable communities in the general media.Compared to other NLP tasks of paragraph classification, the negative language presented in the PCL detection task is usually more implicit and subtle to be recognized, making the performance of common text-classification approaches disappointed. Targeting the PCL detection problem in SemEval-2022 Task 4, in this paper, we give an introduction to our team's solution, which exploits the power of prompt-based learning on paragraph classification. We reformulate the task as an appropriate cloze prompt and use pre-trained Masked Language Models to fill the cloze slot. For the two subtasks, binary classification and multi-label classification, DeBERTa model is adopted and fine-tuned to predict masked label words of task-specific prompts. On the evaluation dataset, for binary classification, our approach achieves an F1-score of 0.6406; for multi-label classification, our approach achieves an macro-F1-score of 0.4689 and ranks first in the leaderboard.