论文标题
数据恢复活力:用于神经机器翻译的无效培训示例
Data Rejuvenation: Exploiting Inactive Training Examples for Neural Machine Translation
论文作者
论文摘要
大规模培训数据集是神经机器翻译(NMT)模型最近成功的核心。但是,大规模数据中的复杂模式和潜在噪声使训练NMT模型变得困难。在这项工作中,我们探索以确定对模型绩效贡献较小的不活动培训示例,并表明不活动示例的存在取决于数据分布。我们进一步介绍了数据恢复活力,以通过利用非活动示例来改善大规模数据集对NMT模型的培训。提出的框架包括三个阶段。首先,我们在原始培训数据上训练标识模型,并使用它来区分其句子级输出概率的非活动示例和主动示例。然后,我们在主动示例上训练复兴模型,该模型用于重新标记具有前向翻译的非活动示例。最后,将复兴的示例和主动示例组合在一起以训练最终的NMT模型。 WMT14英语 - 德国人和英语 - 法国数据集的实验结果表明,提出的数据恢复活力始终如一,并显着提高了多种强NMT模型的性能。广泛的分析表明,我们的方法稳定并加速了NMT模型的训练过程,从而产生了具有更好概括能力的最终模型。
Large-scale training datasets lie at the core of the recent success of neural machine translation (NMT) models. However, the complex patterns and potential noises in the large-scale data make training NMT models difficult. In this work, we explore to identify the inactive training examples which contribute less to the model performance, and show that the existence of inactive examples depends on the data distribution. We further introduce data rejuvenation to improve the training of NMT models on large-scale datasets by exploiting inactive examples. The proposed framework consists of three phases. First, we train an identification model on the original training data, and use it to distinguish inactive examples and active examples by their sentence-level output probabilities. Then, we train a rejuvenation model on the active examples, which is used to re-label the inactive examples with forward-translation. Finally, the rejuvenated examples and the active examples are combined to train the final NMT model. Experimental results on WMT14 English-German and English-French datasets show that the proposed data rejuvenation consistently and significantly improves performance for several strong NMT models. Extensive analyses reveal that our approach stabilizes and accelerates the training process of NMT models, resulting in final models with better generalization capability.