文本生成的动量校准

论文标题

文本生成的动量校准

Momentum Calibration for Text Generation

论文作者

Zhang, Xingxing, Liu, Yiran, Wang, Xun, He, Pengcheng, Yu, Yang, Chen, Si-Qing, Xiong, Wayne, Wei, Furu

论文摘要

大多数文本生成任务的输入和输出可以转换为两个令牌序列，可以使用序列到序列学习建模工具（例如变形金刚）对其进行建模。这些模型通常通过最大化的可能性来训练输出文本序列，并假设输入序列，并且在训练期间给出了所有前代币，而在推断期间，模型遇到了暴露偏差问题（即，它仅在搜索过程中访问其先前预测的先前预测的代币代币，而不是金色的标记）。在本文中，我们建议文本生成的MOCA（{\ bf mo} mentum {\ bf ca} libration）。 MOCA是一种在线方法，它使用具有横梁搜索的动量移动平均发电机动态生成（但一致）的样品，而MOCA学会了将这些样品的模型得分与实际质量保持一致。在四个文本生成数据集（即CNN/DailyMail，Xsum，Samsum和Gigaword）上进行的实验显示MOCA始终使用香草微调改进强训练的预训练的变压器，我们在CNN/Dailymail和Samsum数据集中实现了最先进的结果。

The input and output of most text generation tasks can be transformed to two sequences of tokens and they can be modeled using sequence-to-sequence learning modeling tools such as Transformers. These models are usually trained by maximizing the likelihood the output text sequence and assumes the input sequence and all gold preceding tokens are given during training, while during inference the model suffers from the exposure bias problem (i.e., it only has access to its previously predicted tokens rather gold tokens during beam search). In this paper, we propose MoCa ({\bf Mo}mentum {\bf Ca}libration) for text generation. MoCa is an online method that dynamically generates slowly evolving (but consistent) samples using a momentum moving average generator with beam search and MoCa learns to align its model scores of these samples with their actual qualities. Experiments on four text generation datasets (i.e., CNN/DailyMail, XSum, SAMSum and Gigaword) show MoCa consistently improves strong pre-trained transformers using vanilla fine-tuning and we achieve the state-of-the-art results on CNN/DailyMail and SAMSum datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题