多平行单词对齐的图形神经网络

论文标题

多平行单词对齐的图形神经网络

Graph Neural Networks for Multiparallel Word Alignment

论文作者

Imani, Ayyoob, Şenel, Lütfi Kerem, Sabet, Masoud Jalili, Yvon, François, Schütze, Hinrich

论文摘要

一段时间后，对单词一致性的兴趣再次增加，因为它们在诸如类型学研究，跨语言注释投影和机器翻译等领域的实用性中再次增加。通常，对齐算法仅使用bitext，并且不利用许多平行语料库是多面关系的事实。在这里，我们通过考虑所有语言对，计算多种语言对之间的高质量单词对齐。首先，我们创建一个多平行单词对齐图，在一个图中连接所有双语单词对齐对。接下来，我们使用图形神经网络（GNN）来利用图形结构。我们的GNN方法（i）利用有关输入单词的含义，位置和语言的信息，（ii）结合了来自多个并行句子的信息，（iii）添加并删除了最初对齐的边缘，并且（iv）产生了一个可以超出培训句子的预测模型。我们表明，社区检测为多平行单词对齐提供了有价值的信息。我们的方法在三个单词分配数据集和下游任务上的先前工作优于以前的工作。

After a period of decrease, interest in word alignments is increasing again for their usefulness in domains such as typological research, cross-lingual annotation projection, and machine translation. Generally, alignment algorithms only use bitext and do not make use of the fact that many parallel corpora are multiparallel. Here, we compute high-quality word alignments between multiple language pairs by considering all language pairs together. First, we create a multiparallel word alignment graph, joining all bilingual word alignment pairs in one graph. Next, we use graph neural networks (GNNs) to exploit the graph structure. Our GNN approach (i) utilizes information about the meaning, position, and language of the input words, (ii) incorporates information from multiple parallel sentences, (iii) adds and removes edges from the initial alignments, and (iv) yields a prediction model that can generalize beyond the training sentences. We show that community detection provides valuable information for multiparallel word alignment. Our method outperforms previous work on three word-alignment datasets and on a downstream task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题