论文标题

与生物形式的COVID-19文献​​的多标签主题分类

Multi-label topic classification for COVID-19 literature with Bioformer

论文作者

Fang, Li, Wang, Kai

论文摘要

我们描述了Bioformer团队参与COVID-19文献​​的多标签主题分类任务(Biocreative VII的曲目5)。主题分类是使用不同的BERT模型(Biobert,PubMedbert和Bioformer)进行的。我们将主题分类任务提出为句子对分类问题,其中标题是第一个句子,而摘要是第二个句子。我们的结果表明,在此任务中,生物形态的表现优于Biobert和PubMedbert。与基线结果相比,我们的最佳模型分别提高了微型,宏观和基于实例的F1得分8.8%,15.5%,7.4%。在这一挑战中,生物构造的微型F1和宏F1得分最高。在挑战后实验中,我们发现在Covid-19文章上进行生物形态预处理进一步改善了性能。

We describe Bioformer team's participation in the multi-label topic classification task for COVID-19 literature (track 5 of BioCreative VII). Topic classification is performed using different BERT models (BioBERT, PubMedBERT, and Bioformer). We formulate the topic classification task as a sentence pair classification problem, where the title is the first sentence, and the abstract is the second sentence. Our results show that Bioformer outperforms BioBERT and PubMedBERT in this task. Compared to the baseline results, our best model increased micro, macro, and instance-based F1 score by 8.8%, 15.5%, 7.4%, respectively. Bioformer achieved the highest micro F1 and macro F1 scores in this challenge. In post-challenge experiments, we found that pretraining of Bioformer on COVID-19 articles further improves the performance.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源