论文标题
使用图形神经网络和基准测试数据集中的谣言检测和过采样
Rumour detection using graph neural network and oversampling in benchmark Twitter dataset
论文作者
论文摘要
最近,在线社交媒体已成为新信息和错误信息或谣言的主要来源。在没有自动谣言检测系统的情况下,谣言的传播增加了多种多样,导致严重的社会损害。在这项工作中,我们提出了一种新的方法来构建自动谣言检测系统,通过重新采样以减轻谣言检测任务中阶级失衡的根本挑战。我们的过采样方法依赖于上下文化的数据增强来生成数据集中代表性不足类的合成样本。关键思想利用线程中的推文选择以进行扩展,可以通过引入非随机选择标准来实现,以将增强过程集中在相关推文上。此外,我们建议两个图形神经网络(GNN)对线程上的非线性对话进行建模。为了增强我们方法中的推文表示形式,我们采用了基于最先进的Bertweet模型的自定义功能选择技术。三个公开数据集的实验证实,1)我们的GNN模型的表现优于当前最新分类器的分类器超过20%(F1得分); 2)我们的过采样技术将模型性能提高了9%以上;(F1得分)3)专注于通过非随机选择标准进行数据增强的相关推文可以进一步改善结果; 4)我们的方法在很早的阶段就具有检测谣言的卓越能力。
Recently, online social media has become a primary source for new information and misinformation or rumours. In the absence of an automatic rumour detection system the propagation of rumours has increased manifold leading to serious societal damages. In this work, we propose a novel method for building automatic rumour detection system by focusing on oversampling to alleviating the fundamental challenges of class imbalance in rumour detection task. Our oversampling method relies on contextualised data augmentation to generate synthetic samples for underrepresented classes in the dataset. The key idea exploits selection of tweets in a thread for augmentation which can be achieved by introducing a non-random selection criteria to focus the augmentation process on relevant tweets. Furthermore, we propose two graph neural networks(GNN) to model non-linear conversations on a thread. To enhance the tweet representations in our method we employed a custom feature selection technique based on state-of-the-art BERTweet model. Experiments of three publicly available datasets confirm that 1) our GNN models outperform the the current state-of-the-art classifiers by more than 20%(F1-score); 2) our oversampling technique increases the model performance by more than 9%;(F1-score) 3) focusing on relevant tweets for data augmentation via non-random selection criteria can further improve the results; and 4) our method has superior capabilities to detect rumours at very early stage.