rmbr：一个正规化的最小贝叶斯风险重新骑士框架机器翻译框架

论文标题

rmbr：一个正规化的最小贝叶斯风险重新骑士框架机器翻译框架

RMBR: A Regularized Minimum Bayes Risk Reranking Framework for Machine Translation

论文作者

Zhang, Yidan, Wan, Yu, Liu, Dayiheng, Yang, Baosong, He, Zhenan

论文摘要

梁搜索是神经机器翻译（NMT）最广泛使用的解码方法。实际上，在N候选者中具有最高对数概率的前1名候选人被选为首选。但是，这位前1名候选人可能不是N最好列表中最佳的总体翻译。最近，提出了最小贝叶斯风险（MBR）解码来提高NMT的质量，NMT寻求共识翻译，平均与N-最佳列表中其他候选人最接近。我们认为MBR仍然遇到以下问题：效用函数仅考虑候选人之间的词汇水平相似性；预期的公用事业公司考虑了整个N-最佳列表，这是耗时的，而尾部列表中的候选人不足可能会损害表现。仅考虑候选人之间的关系。为了解决这些问题，我们设计了一个正规的MBR Reranking框架（RMBR），该框架认为基于语义的相似性并通过截断列表来计算每个候选人的预期实用程序。我们希望提出的框架能够进一步考虑每个候选人的翻译质量和模型不确定性。因此，提出的质量正规化器和不确定性正常化程序已纳入框架中。对多个翻译任务的广泛实验证明了我们方法的有效性。

Beam search is the most widely used decoding method for neural machine translation (NMT). In practice, the top-1 candidate with the highest log-probability among the n candidates is selected as the preferred one. However, this top-1 candidate may not be the best overall translation among the n-best list. Recently, Minimum Bayes Risk (MBR) decoding has been proposed to improve the quality for NMT, which seeks for a consensus translation that is closest on average to other candidates from the n-best list. We argue that MBR still suffers from the following problems: The utility function only considers the lexical-level similarity between candidates; The expected utility considers the entire n-best list which is time-consuming and inadequate candidates in the tail list may hurt the performance; Only the relationship between candidates is considered. To solve these issues, we design a regularized MBR reranking framework (RMBR), which considers semantic-based similarity and computes the expected utility for each candidate by truncating the list. We expect the proposed framework to further consider the translation quality and model uncertainty of each candidate. Thus the proposed quality regularizer and uncertainty regularizer are incorporated into the framework. Extensive experiments on multiple translation tasks demonstrate the effectiveness of our method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题