论文标题
基于剩余的文本生成模型
Residual Energy-Based Models for Text Generation
论文作者
论文摘要
文本生成在许多NLP任务中无处不在,从摘要到对话和机器翻译。主要的参数方法基于局部归一化模型,该模型一次预测一个单词。尽管这些工作非常好,但由于发电过程的贪婪性质,它们受到暴露偏见的困扰。在这项工作中,我们研究了非归一化的基于能量的模型(EBM),这些模型不是在令牌上,而是在序列水平上运行。为了使训练可进行训练,我们首先在验证的本地归一化语言模型的剩余工作中工作,其次是使用噪声对比估计进行训练。此外,由于EBM在序列级别上工作,因此我们可以利用验证的双向上下文表示,例如Bert和Roberta。我们对两个大型语言建模数据集的实验表明,与本地标准化的基线相比,残留的EBM会产生较低的困惑。此外,根据人类评估,通过重要性抽样产生非常有效,质量高于基线模型。
Text generation is ubiquitous in many NLP tasks, from summarization, to dialogue and machine translation. The dominant parametric approach is based on locally normalized models which predict one word at a time. While these work remarkably well, they are plagued by exposure bias due to the greedy nature of the generation process. In this work, we investigate un-normalized energy-based models (EBMs) which operate not at the token but at the sequence level. In order to make training tractable, we first work in the residual of a pretrained locally normalized language model and second we train using noise contrastive estimation. Furthermore, since the EBM works at the sequence level, we can leverage pretrained bi-directional contextual representations, such as BERT and RoBERTa. Our experiments on two large language modeling datasets show that residual EBMs yield lower perplexity compared to locally normalized baselines. Moreover, generation via importance sampling is very efficient and of higher quality than the baseline models according to human evaluation.