关于将语言方式整合到统计和神经机器翻译中

论文标题

关于将语言方式整合到统计和神经机器翻译中

On the Integration of LinguisticFeatures into Statistical and Neural Machine Translation

论文作者

Vanmassenhove, Eva

论文摘要

新的机器翻译（MT）技术正在迅速发展，随着它们的实现人类平价的大胆主张，例如：（i）结果“平均双语人类翻译者达到的准确性”（Wu等人，2017b）或（ii）“与专业人类翻译者相比，翻译质量是在人类的范围上，与专业的人类翻译相比，这是众所周知的”（Hassan et n fire naim and a light。除了这些论文中的许多论文对人类平价的定义做出的定义之外，这些轰动性主张通常不受对翻译所涉及的所有方面的完整分析的支持。确定MT统计方法的优势与人类翻译方式之间的差异一直是我们研究的起点。通过查看山的产出和语言理论，我们能够确定剩余的问题。问题范围从简单的数字和性别一致性错误到更复杂的现象，例如对方面值和时态的正确翻译。我们的实验证实，以及其他研究（Bentivogli等，2016），神经MT在许多方面都超过了统计MT。但是，仍然存在一些问题，而另一些问题也出现了。我们介绍了一系列与将特定语言特征整合到统计和神经MT中有关的问题，旨在分析并为其中的某些解决方案提供解决方案。我们的工作着重于解决三个主要的研究问题，这些问题围绕语言学与MT之间的复杂关系。我们确定缺少的语言信息，以便自动翻译系统产生更准确的翻译并将其他功能集成到现有管道中。我们将过度笼统或“算法偏见”确定为神经MT的潜在缺点，并将其与其他许多语言问题联系起来。

New machine translations (MT) technologies are emerging rapidly and with them, bold claims of achieving human parity such as: (i) the results produced approach "accuracy achieved by average bilingual human translators" (Wu et al., 2017b) or (ii) the "translation quality is at human parity when compared to professional human translators" (Hassan et al., 2018) have seen the light of day (Laubli et al., 2018). Aside from the fact that many of these papers craft their own definition of human parity, these sensational claims are often not supported by a complete analysis of all aspects involved in translation. Establishing the discrepancies between the strengths of statistical approaches to MT and the way humans translate has been the starting point of our research. By looking at MT output and linguistic theory, we were able to identify some remaining issues. The problems range from simple number and gender agreement errors to more complex phenomena such as the correct translation of aspectual values and tenses. Our experiments confirm, along with other studies (Bentivogli et al., 2016), that neural MT has surpassed statistical MT in many aspects. However, some problems remain and others have emerged. We cover a series of problems related to the integration of specific linguistic features into statistical and neural MT, aiming to analyse and provide a solution to some of them. Our work focuses on addressing three main research questions that revolve around the complex relationship between linguistics and MT in general. We identify linguistic information that is lacking in order for automatic translation systems to produce more accurate translations and integrate additional features into the existing pipelines. We identify overgeneralization or 'algorithmic bias' as a potential drawback of neural MT and link it to many of the remaining linguistic issues.

下载PDF全文

下载文献需遵守相关版权规定

论文标题