低资源神经机器翻译：五种非洲语言的基准

论文标题

低资源神经机器翻译：五种非洲语言的基准

Low Resource Neural Machine Translation: A Benchmark for Five African Languages

论文作者

Lakew, Surafel M., Negri, Matteo, Turchi, Marco

论文摘要

神经机器翻译（NMT）的最新优势显示了低资源语言（LRL）翻译任务的改善。在这项工作中，我们在英语和五个非洲LRL对之间进行基准NMT（斯瓦希里语，阿姆哈里克，蒂格里尼亚，奥罗莫，索马里[satos]）。我们收集了SATOS语言上的可用资源，以评估LRLS的NMT当前状态。我们的评估，将基线单语言对NMT模型与半监督学习，转移学习和多语言建模进行比较，在EN-LRL和LRL-EN方向上都显示出显着的性能改进。在平均BLEU得分方面，多语言方法在十分之六的翻译方向中显示出最大的收益，最高+5点。为了证明每个模型的概括能力，我们还报告了多域测试集的结果。我们发布了标准化的实验数据和未来工作的测试集，以解决资源不足的设置，尤其是SATOS语言中NMT的挑战。

Recent advents in Neural Machine Translation (NMT) have shown improvements in low-resource language (LRL) translation tasks. In this work, we benchmark NMT between English and five African LRL pairs (Swahili, Amharic, Tigrigna, Oromo, Somali [SATOS]). We collected the available resources on the SATOS languages to evaluate the current state of NMT for LRLs. Our evaluation, comparing a baseline single language pair NMT model against semi-supervised learning, transfer learning, and multilingual modeling, shows significant performance improvements both in the En-LRL and LRL-En directions. In terms of averaged BLEU score, the multilingual approach shows the largest gains, up to +5 points, in six out of ten translation directions. To demonstrate the generalization capability of each model, we also report results on multi-domain test sets. We release the standardized experimental data and the test sets for future works addressing the challenges of NMT in under-resourced settings, in particular for the SATOS languages.

下载PDF全文

下载文献需遵守相关版权规定

论文标题