要伯特或不伯特：比较特定任务和任务不合时宜的半监督方法的序列标记方法

论文标题

要伯特或不伯特：比较特定任务和任务不合时宜的半监督方法的序列标记方法

To BERT or Not to BERT: Comparing Task-specific and Task-agnostic Semi-Supervised Approaches for Sequence Tagging

论文作者

Bhattacharjee, Kasturi, Ballesteros, Miguel, Anubhai, Rishita, Muresan, Smaranda, Ma, Jie, Ladhak, Faisal, Al-Onaizan, Yaser

论文摘要

使用像伯特（Bert）这样的类似变形金刚的架构来利用大量未标记的数据，由于它们在学习一般表示方面的有效性，因此在近期获得了人们的流行，然后可以进一步微调下游任务，以取得成功。但是，从经济和环境的角度来看，培训这些模型可能是昂贵的。在这项工作中，我们研究了如何有效地使用未标记的数据：通过探索特定于任务的半监督方法，跨视图培训（CVT），并将其与包括域和相关英语数据的多个设置中的任务无关的BERT进行比较。 CVT使用了较轻的模型体系结构，我们表明它在一组序列标记任务上实现了与BERT相似的性能，而财务和环境影响较小。

Leveraging large amounts of unlabeled data using Transformer-like architectures, like BERT, has gained popularity in recent times owing to their effectiveness in learning general representations that can then be further fine-tuned for downstream tasks to much success. However, training these models can be costly both from an economic and environmental standpoint. In this work, we investigate how to effectively use unlabeled data: by exploring the task-specific semi-supervised approach, Cross-View Training (CVT) and comparing it with task-agnostic BERT in multiple settings that include domain and task relevant English data. CVT uses a much lighter model architecture and we show that it achieves similar performance to BERT on a set of sequence tagging tasks, with lesser financial and environmental impact.

下载PDF全文

下载文献需遵守相关版权规定

论文标题