论文标题
Wnut-2020任务2:使用Bert的Dartmouth CS:信息丰富的Covid-19 Tweet分类
Dartmouth CS at WNUT-2020 Task 2: Informative COVID-19 Tweet Classification Using BERT
论文作者
论文摘要
我们描述了为Wnut-2020共享任务2开发的系统,即提供信息丰富的Covid-19英语推文的标识。伯特是自然语言处理任务的高度性能模型。我们通过微调BERT并将其嵌入到特定于推文的功能和训练支持向量机(SVM)(以下称为BERT+)中来串联到嵌入方式,从而提高了BERT在此分类任务中的性能。我们将其性能与机器学习模型套件进行了比较。我们使用Twitter特定的数据清洁管道和单词级TF-IDF来提取非伯特模型的功能。 BERT+是最高表现模型,F1得分为0.8713。
We describe the systems developed for the WNUT-2020 shared task 2, identification of informative COVID-19 English Tweets. BERT is a highly performant model for Natural Language Processing tasks. We increased BERT's performance in this classification task by fine-tuning BERT and concatenating its embeddings with Tweet-specific features and training a Support Vector Machine (SVM) for classification (henceforth called BERT+). We compared its performance to a suite of machine learning models. We used a Twitter specific data cleaning pipeline and word-level TF-IDF to extract features for the non-BERT models. BERT+ was the top performing model with an F1-score of 0.8713.