论文标题

爱丁堡大学提交WMT22代码混合共享任务(MIXMT)

The University of Edinburgh's Submission to the WMT22 Code-Mixing Shared Task (MixMT)

论文作者

Kirefu, Faheem, Iyer, Vivek, Chen, Pinzhen, Burchell, Laurie

论文摘要

爱丁堡大学参加了WMT22关于代码混合翻译的共享任务。这由两个子任务组成:i)从平行的印地语和英语句子中生成代码混合的印地语/英语(Hinglish)文本生成,以及ii)从Hinglish到英语的机器翻译。由于两个子任务都被认为是低资源的,因此我们将精力集中在仔细的数据生成和策划上,尤其是使用单语资源的倒退。对于子任务1,我们探讨了被约束解码对英语和音译子词的影响,以产生hinglish。对于子任务2,我们研究了不同的训练技术,即比较现有机器翻译模型的简单初始化并对齐增强。对于两个子任务,我们发现我们的基线系统效果最好。我们两个子任务的系统都是总体表现最佳的提交之一。

The University of Edinburgh participated in the WMT22 shared task on code-mixed translation. This consists of two subtasks: i) generating code-mixed Hindi/English (Hinglish) text generation from parallel Hindi and English sentences and ii) machine translation from Hinglish to English. As both subtasks are considered low-resource, we focused our efforts on careful data generation and curation, especially the use of backtranslation from monolingual resources. For subtask 1 we explored the effects of constrained decoding on English and transliterated subwords in order to produce Hinglish. For subtask 2, we investigated different pretraining techniques, namely comparing simple initialisation from existing machine translation models and aligned augmentation. For both subtasks, we found that our baseline systems worked best. Our systems for both subtasks were one of the overall top-performing submissions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源