论文标题

使用自我训练的低资源神经机器翻译的增强反向翻译

Enhanced back-translation for low resource neural machine translation using self-training

论文作者

Abdulmumin, Idris, Galadanci, Bashir Shehu, Isa, Abubakar

论文摘要

目前,使用单语目标数据(合成平行数据)的背面翻译改进神经机器翻译(NMT)模型是训练改进的翻译系统的最新方法。在许多研究中,已显示向后退系统的质量 - 经过可用的并行数据训练并用于反向翻译,以影响最终NMT模型的性能。在低资源条件下,可用的并行数据通常不足以训练可以产生训练标准翻译模型所需的定性合成数据的向后模型。这项工作提出了一种自我训练策略,其中使用向后翻译技术来改善模型本身。该技术已显示出可提高基线低资源IWSLT'14英语 - 德国人和IWSLT'15英语 - 越南 - 越南语的反向翻译模型,分别为11.06和1.5 blbleus。改进的英语 - 德国向后模型生成的合成数据用于训练一个前向模型,该模型超过了另一个使用标准反向翻译的前向模型,该模型由2.7 BLEU进行了训练。

Improving neural machine translation (NMT) models using the back-translations of the monolingual target data (synthetic parallel data) is currently the state-of-the-art approach for training improved translation systems. The quality of the backward system - which is trained on the available parallel data and used for the back-translation - has been shown in many studies to affect the performance of the final NMT model. In low resource conditions, the available parallel data is usually not enough to train a backward model that can produce the qualitative synthetic data needed to train a standard translation model. This work proposes a self-training strategy where the output of the backward model is used to improve the model itself through the forward translation technique. The technique was shown to improve baseline low resource IWSLT'14 English-German and IWSLT'15 English-Vietnamese backward translation models by 11.06 and 1.5 BLEUs respectively. The synthetic data generated by the improved English-German backward model was used to train a forward model which out-performed another forward model trained using standard back-translation by 2.7 BLEU.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源