相互监督的学习改善神经机器翻译

论文标题

相互监督的学习改善神经机器翻译

Reciprocal Supervised Learning Improves Neural Machine Translation

论文作者

Xu, Minkai, Wang, Mingxuan, Lin, Zhouhan, Zhou, Hao, Zhang, Weinan, Li, Lei

论文摘要

尽管最近在图像分类方面取得了成功，但自我训练仅在结构化预测任务（例如神经机器翻译（NMT））上获得了有限的收益。这主要是由于目标空间的组成性，远处的预测假设导致臭名昭著的加强错误问题。在本文中，我们重新审视了多种不同模型的利用，并提出了一种简单而有效的方法，名为“相互监督学习”（RSL）。 RSL首先利用各个模型生成伪并行数据，然后在合成的合成语料库上合作训练每个模型。 RSL利用了不同的参数化模型具有不同的电感偏见，并且可以通过共同利用彼此的协议来做出更好的预测。与以前的知识蒸馏方法建立在更强大的教师基础上不同，RSL能够通过引入其他可比较甚至较弱的模型来提高一个模型的准确性。 RSL也可以视为合奏更有效的替代方案。广泛的实验表明，RSL在具有明显边缘的几个基准上的出色性能。

Despite the recent success on image classification, self-training has only achieved limited gains on structured prediction tasks such as neural machine translation (NMT). This is mainly due to the compositionality of the target space, where the far-away prediction hypotheses lead to the notorious reinforced mistake problem. In this paper, we revisit the utilization of multiple diverse models and present a simple yet effective approach named Reciprocal-Supervised Learning (RSL). RSL first exploits individual models to generate pseudo parallel data, and then cooperatively trains each model on the combined synthetic corpus. RSL leverages the fact that different parameterized models have different inductive biases, and better predictions can be made by jointly exploiting the agreement among each other. Unlike the previous knowledge distillation methods built upon a much stronger teacher, RSL is capable of boosting the accuracy of one model by introducing other comparable or even weaker models. RSL can also be viewed as a more efficient alternative to ensemble. Extensive experiments demonstrate the superior performance of RSL on several benchmarks with significant margins.

下载PDF全文

下载文献需遵守相关版权规定

论文标题