论文标题
基于能量的重读:使用基于能量的模型改善神经机器翻译
Energy-Based Reranking: Improving Neural Machine Translation Using Energy-Based Models
论文作者
论文摘要
以前已经对自回旋神经机器翻译(NMT)进行了研究,并导致了替代培训算法(Ranzato等,2016; Norouzi等,2016; Shen等,2016; Shen等,2016; Wu et al and 201;但是,由于其计算效率和稳定性,MLE培训仍然是自回旋NMT的事实上的方法。尽管训练目标与任务度量之间的不匹配,但我们注意到,从基于MLE的训练有素的NMT支持所需分布的样品 - 与梁解码输出相比,有更高的BLEU得分。为了从这种观察中受益,我们训练一个基于能量的模型来模仿任务措施的行为(即,基于能量的模型将较低的能量分配给具有较高BLEU得分的样品),这导致了基于从NMT:基于能量的重新排列(EBR)的样本的重新排列算法。我们使用边缘能量模型(超过目标句子)和关节能量模型(源和目标句子)。我们使用联合能量模型的EBR始终提高了基于变压器的NMT的性能:+4 bleu点在IWSLT'14德语英语上,+3.0 BELU点在Sinhala-English上,+1.2 BLEU在WMT'16英语英语任务上。
The discrepancy between maximum likelihood estimation (MLE) and task measures such as BLEU score has been studied before for autoregressive neural machine translation (NMT) and resulted in alternative training algorithms (Ranzato et al., 2016; Norouzi et al., 2016; Shen et al., 2016; Wu et al., 2018). However, MLE training remains the de facto approach for autoregressive NMT because of its computational efficiency and stability. Despite this mismatch between the training objective and task measure, we notice that the samples drawn from an MLE-based trained NMT support the desired distribution -- there are samples with much higher BLEU score comparing to the beam decoding output. To benefit from this observation, we train an energy-based model to mimic the behavior of the task measure (i.e., the energy-based model assigns lower energy to samples with higher BLEU score), which is resulted in a re-ranking algorithm based on the samples drawn from NMT: energy-based re-ranking (EBR). We use both marginal energy models (over target sentence) and joint energy models (over both source and target sentences). Our EBR with the joint energy model consistently improves the performance of the Transformer-based NMT: +4 BLEU points on IWSLT'14 German-English, +3.0 BELU points on Sinhala-English, +1.2 BLEU on WMT'16 English-German tasks.