长期术语蒙版变压器：文档级神经机器翻译的简单但有效的基线

论文标题

长期术语蒙版变压器：文档级神经机器翻译的简单但有效的基线

Long-Short Term Masking Transformer: A Simple but Effective Baseline for Document-level Neural Machine Translation

论文作者

Zhang, Pei, Chen, Boxing, Ge, Niyu, Fan, Kai

论文摘要

许多文档级神经机器翻译（NMT）系统都探索了上下文感知体系结构的实用性，通常需要越来越多的参数和计算复杂性。但是，很少关注基线模型。在本文中，我们在文档级翻译中广泛研究了标准变压器的优缺点，并发现自动回归属性可以同时带来一致性的优势，也可以带来误差积累的劣势。因此，我们提出了一个令人惊讶的简单长期术语掩盖在标准变压器之上的自我注意，以有效地捕获长期依赖性并减少错误的传播。我们检查了两个公开可用文档级数据集的方法。我们可以在BLEU中取得很大的结果并捕获话语现象。

Many document-level neural machine translation (NMT) systems have explored the utility of context-aware architecture, usually requiring an increasing number of parameters and computational complexity. However, few attention is paid to the baseline model. In this paper, we research extensively the pros and cons of the standard transformer in document-level translation, and find that the auto-regressive property can simultaneously bring both the advantage of the consistency and the disadvantage of error accumulation. Therefore, we propose a surprisingly simple long-short term masking self-attention on top of the standard transformer to both effectively capture the long-range dependence and reduce the propagation of errors. We examine our approach on the two publicly available document-level datasets. We can achieve a strong result in BLEU and capture discourse phenomena.

下载PDF全文

下载文献需遵守相关版权规定

论文标题