训练前变压器作为基于能量的披肩模型

论文标题

训练前变压器作为基于能量的披肩模型

Pre-Training Transformers as Energy-Based Cloze Models

论文作者

Clark, Kevin, Luong, Minh-Thang, Le, Quoc V., Manning, Christopher D.

论文摘要

我们介绍了电气，这是一种基于能量的披肩模型，用于表示文本学习。像伯特一样，它是给定上下文的有条件的代币生成模型。但是，电气不会在可能在上下文中发生的令牌上使用掩蔽或输出完整的分布。取而代之的是，它为每个输入令牌分配一个标量能量得分，以指示其给定上下文的可能性。我们使用基于噪声对抗性估计的算法训练电气，并阐明该学习目标如何与最近提出的Electra Electra预训练方法密切相关。当转移到下游任务时，电气表现良好，并且在产生文本的似然分数方面特别有效：它比语言模型重新排列的语音识别n-test列表比语言模型更快，并且比蒙版语言模型更快。此外，它对Electra在预训练期间学到的知识提供了更清晰，更有原则的看法。

We introduce Electric, an energy-based cloze model for representation learning over text. Like BERT, it is a conditional generative model of tokens given their contexts. However, Electric does not use masking or output a full distribution over tokens that could occur in a context. Instead, it assigns a scalar energy score to each input token indicating how likely it is given its context. We train Electric using an algorithm based on noise-contrastive estimation and elucidate how this learning objective is closely related to the recently proposed ELECTRA pre-training method. Electric performs well when transferred to downstream tasks and is particularly effective at producing likelihood scores for text: it re-ranks speech recognition n-best lists better than language models and much faster than masked language models. Furthermore, it offers a clearer and more principled view of what ELECTRA learns during pre-training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题