Wnut-2020任务2：使用Bert的Dartmouth CS：信息丰富的Covid-19 Tweet分类

论文标题

Wnut-2020任务2：使用Bert的Dartmouth CS：信息丰富的Covid-19 Tweet分类

Dartmouth CS at WNUT-2020 Task 2: Informative COVID-19 Tweet Classification Using BERT

论文作者

Whang, Dylan, Vosoughi, Soroush

论文摘要

我们描述了为Wnut-2020共享任务2开发的系统，即提供信息丰富的Covid-19英语推文的标识。伯特是自然语言处理任务的高度性能模型。我们通过微调BERT并将其嵌入到特定于推文的功能和训练支持向量机（SVM）（以下称为BERT+）中来串联到嵌入方式，从而提高了BERT在此分类任务中的性能。我们将其性能与机器学习模型套件进行了比较。我们使用Twitter特定的数据清洁管道和单词级TF-IDF来提取非伯特模型的功能。 BERT+是最高表现模型，F1得分为0.8713。

We describe the systems developed for the WNUT-2020 shared task 2, identification of informative COVID-19 English Tweets. BERT is a highly performant model for Natural Language Processing tasks. We increased BERT's performance in this classification task by fine-tuning BERT and concatenating its embeddings with Tweet-specific features and training a Support Vector Machine (SVM) for classification (henceforth called BERT+). We compared its performance to a suite of machine learning models. We used a Twitter specific data cleaning pipeline and word-level TF-IDF to extract features for the non-BERT models. BERT+ was the top performing model with an F1-score of 0.8713.

下载PDF全文

下载文献需遵守相关版权规定

论文标题