z-index在checkthat上！实验室2022：tweet文本上的检查值识别

论文标题

z-index在checkthat上！实验室2022：tweet文本上的检查值识别

Z-Index at CheckThat! Lab 2022: Check-Worthiness Identification on Tweet Text

论文作者

Tarannum, Prerona, Alam, Firoj, Hasan, Md. Arid, Noori, Sheak Rashed Haider

论文摘要

社交媒体和数字技术的广泛使用促进了有关事件和活动的各种新闻和信息。尽管分享了积极的信息误导和虚假信息，但社交媒体也正在传播。在确定人类专家和自动工具手动的这种误导性信息方面，已经做出了努力。由于包含事实主张的大量信息正在网上出现，手动努力并不能很好地扩展。因此，自动确定值得支票的主张对于人类专家来说非常有用。在这项研究中，我们描述了我们参与子任务-1a：checkthat的推文（英语，荷兰语和西班牙语）的值得检查！在CLEF 2022的实验室。我们执行了标准的预处理步骤，并应用了不同的模型，以确定给定文本是否值得事实检查。我们使用过度采样技术来平衡数据集和应用SVM和随机森林（RF）和TF-IDF表示。我们还使用了Bert多语言（Bert-M）和XLM-Roberta-Base预训练模型进行实验。我们将BERT-M用于官方提交，我们的系统分别在西班牙语，荷兰语和英语中分别排名第三，第五和第十二。在进一步的实验中，我们的评估表明，变压器模型（Bert-M和XLM-Roberta-bas）在荷兰语和英语语言中优于SVM和RF，在荷兰语和英语中，对于西班牙来说，观察到不同的情况。

The wide use of social media and digital technologies facilitates sharing various news and information about events and activities. Despite sharing positive information misleading and false information is also spreading on social media. There have been efforts in identifying such misleading information both manually by human experts and automatic tools. Manual effort does not scale well due to the high volume of information, containing factual claims, are appearing online. Therefore, automatically identifying check-worthy claims can be very useful for human experts. In this study, we describe our participation in Subtask-1A: Check-worthiness of tweets (English, Dutch and Spanish) of CheckThat! lab at CLEF 2022. We performed standard preprocessing steps and applied different models to identify whether a given text is worthy of fact checking or not. We use the oversampling technique to balance the dataset and applied SVM and Random Forest (RF) with TF-IDF representations. We also used BERT multilingual (BERT-m) and XLM-RoBERTa-base pre-trained models for the experiments. We used BERT-m for the official submissions and our systems ranked as 3rd, 5th, and 12th in Spanish, Dutch, and English, respectively. In further experiments, our evaluation shows that transformer models (BERT-m and XLM-RoBERTa-base) outperform the SVM and RF in Dutch and English languages where a different scenario is observed for Spanish.

下载PDF全文

下载文献需遵守相关版权规定

论文标题