集体智慧：使用自适应知识蒸馏改善低资源神经机器翻译

论文标题

集体智慧：使用自适应知识蒸馏改善低资源神经机器翻译

Collective Wisdom: Improving Low-resource Neural Machine Translation using Adaptive Knowledge Distillation

论文作者

Saleh, Fahimeh, Buntine, Wray, Haffari, Gholamreza

论文摘要

平行句子对的稀缺性为在双义较低的场景中训练高质量的神经机器翻译（NMT）模型带来了重大障碍。标准方法是转移学习，其中涉及对高资源语言对培训的模型，并根据感兴趣的低资源MT条件的数据进行微调。但是，尚不清楚哪种高资源语言对为目标MT设置提供了最佳的转移学习。此外，不同的转移模型可能具有互补的语义和/或句法强度，因此仅使用一个模型可以是最佳的。在本文中，我们使用知识蒸馏解决了这个问题，我们建议将教师模型集合的知识提炼成单个学生模型。随着这些教师模型的质量不同，我们提出了一种有效的自适应知识蒸馏方法，以动态调整教师模型在蒸馏过程中的贡献。从IWSLT将六对语言对转移到TED Talks的五个低资源语言对的实验证明了我们方法的有效性，与强质基线相比，提高了+0.9 BLEU得分。

Scarcity of parallel sentence-pairs poses a significant hurdle for training high-quality Neural Machine Translation (NMT) models in bilingually low-resource scenarios. A standard approach is transfer learning, which involves taking a model trained on a high-resource language-pair and fine-tuning it on the data of the low-resource MT condition of interest. However, it is not clear generally which high-resource language-pair offers the best transfer learning for the target MT setting. Furthermore, different transferred models may have complementary semantic and/or syntactic strengths, hence using only one model may be sub-optimal. In this paper, we tackle this problem using knowledge distillation, where we propose to distill the knowledge of ensemble of teacher models to a single student model. As the quality of these teacher models varies, we propose an effective adaptive knowledge distillation approach to dynamically adjust the contribution of the teacher models during the distillation process. Experiments on transferring from a collection of six language pairs from IWSLT to five low-resource language-pairs from TED Talks demonstrate the effectiveness of our approach, achieving up to +0.9 BLEU score improvement compared to strong baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题