多通道CNN使用混合功能对Nepali Covid-19进行分类相关推文

论文标题

多通道CNN使用混合功能对Nepali Covid-19进行分类相关推文

Multi-channel CNN to classify nepali covid-19 related tweets using hybrid features

论文作者

Sitaula, Chiranjibi, Shahi, Tej Bahadur

论文摘要

由于目前的19日大流行，人们对人们的恐惧越来越大，因此引发了一些健康并发症，例如抑郁和焦虑。这种并发症不仅影响了发达国家，还影响了尼泊尔等发展中国家。这些并发症可以从人民在适当的分析和情感分类之后在线发布的推文/评论来理解。然而，由于每条推文中的令牌/单词数量有限，因此捕获与它们相关的多个信息以更好地理解始终至关重要。在这项研究中，首先，我们通过将句法和语义信息（称为混合特征）组合来表示每条推文。句法信息是通过单词方法袋生成的，而语义信息是根据基于FastText（FT）和域特异性（DS）方法的组合生成的。其次，我们设计了一个新型的多渠道卷积神经网络（MCNN），该神经网络（MCNN）结合了多个CNN，以捕获多尺度信息以更好地分类。最后，我们评估了提出的特征提取方法和MCNN模型的疗效，将推文分类为Nepcov19tweets数据集中的三个情绪类别（正，中性和负面），这是尼泊尔语言中唯一的公共Covid-19 Tweets数据集。评估结果表明，所提出的混合动力特征优于单个特征提取方法的分类精度为69.7％，而MCNN模型在分类过程中的分类精度最高71.3％。

Because of the current COVID-19 pandemic with its increasing fears among people, it has triggered several health complications such as depression and anxiety. Such complications have not only affected the developed countries but also developing countries such as Nepal. These complications can be understood from peoples' tweets/comments posted online after their proper analysis and sentiment classification. Nevertheless, owing to the limited number of tokens/words in each tweet, it is always crucial to capture multiple information associated with them for their better understanding. In this study, we, first, represent each tweet by combining both syntactic and semantic information, called hybrid features. The syntactic information is generated from the bag of words method, whereas the semantic information is generated from the combination of the fastText-based (ft) and domain-specific (ds) methods. Second, we design a novel multi-channel convolutional neural network (MCNN), which ensembles the multiple CNNs, to capture multi-scale information for better classification. Last, we evaluate the efficacy of both the proposed feature extraction method and the MCNN model classifying tweets into three sentiment classes (positive, neutral and negative) on NepCOV19Tweets dataset, which is the only public COVID-19 tweets dataset in Nepali language. The evaluation results show that the proposed hybrid features outperform individual feature extraction methods with the highest classification accuracy of 69.7% and the MCNN model outperforms the existing methods with the highest classification accuracy of 71.3% during classification.

下载PDF全文

下载文献需遵守相关版权规定

论文标题