论文标题
Arcovidvac:分析有关COVID-19疫苗接种的阿拉伯语推文
ArCovidVac: Analyzing Arabic Tweets About COVID-19 Vaccination
论文作者
论文摘要
COVID-19大流行和第一个全球弱读的出现以许多不同的方式改变了我们的生活。我们依靠社交媒体来获取有关COVID-19-19大流行的最新信息,同时又获得了传播信息。社交媒体中的内容不仅与健康相关的建议,计划和政策制定者的信息新闻,还包含阴谋和谣言。一旦发布以做出可行的决定(例如,揭穿谣言或采取某些旅行措施),请立即确定这些信息。为了应对这一挑战,我们开发并公开发布了第一批手动注释的阿拉伯推文数据集Arcovidvac进行了COVID-19疫苗接种运动,涵盖了阿拉伯地区的许多国家。该数据集具有不同的注释层,包括(i)信息性(推文更重要); (ii)细粒度的推文内容类型(例如建议,谣言,限制,身份验证新闻/信息); (iii)采取疫苗接种的态度(促疫苗接种,中性,抗疫苗接种)。此外,我们对数据进行了深入的分析,探索了不同疫苗,趋势主题标签,主题和攻击性的流行。我们研究了各种推文类型的数据以及对疫苗的立场变化。我们使用变压器体系结构对Arcovidvac数据集进行了基准测试,以提供信息性,内容类型和立场检测。
The emergence of the COVID-19 pandemic and the first global infodemic have changed our lives in many different ways. We relied on social media to get the latest information about the COVID-19 pandemic and at the same time to disseminate information. The content in social media consisted not only health related advises, plans, and informative news from policy makers, but also contains conspiracies and rumors. It became important to identify such information as soon as they are posted to make actionable decisions (e.g., debunking rumors, or taking certain measures for traveling). To address this challenge, we develop and publicly release the first largest manually annotated Arabic tweet dataset, ArCovidVac, for the COVID-19 vaccination campaign, covering many countries in the Arab region. The dataset is enriched with different layers of annotation, including, (i) Informativeness (more vs. less importance of the tweets); (ii) fine-grained tweet content types (e.g., advice, rumors, restriction, authenticate news/information); and (iii) stance towards vaccination (pro-vaccination, neutral, anti-vaccination). Further, we performed in-depth analysis of the data, exploring the popularity of different vaccines, trending hashtags, topics and presence of offensiveness in the tweets. We studied the data for individual types of tweets and temporal changes in stance towards vaccine. We benchmarked the ArCovidVac dataset using transformer architectures for informativeness, content types, and stance detection.