论文标题

跨语言句法差异分析

Fine-Grained Analysis of Cross-Linguistic Syntactic Divergences

论文作者

Nikolaev, Dmitry, Arviv, Ofir, Karidi, Taelin, Kenneth, Neta, Mitnik, Veronika, Saeboe, Lilja Maria, Abend, Omri

论文摘要

不同语言的语法收敛和分解的模式通常用于为跨语性转移的工作提供信息。然而,在量化语言对之间不同句法差异的普遍性方面,几乎没有进行实证工作。我们提出了一个以普遍依赖性为基础的平行语料库中任何语言对提取分歧模式的框架。我们表明,我们的框架提供了跨语言差异的详细图片,概括了以前的方法,并将自己适合完全自动化。我们进一步介绍了一个新颖的数据集,这是五种语言的平行UD语料库的手动单词一致子集,并使用它来执行详细的语料库研究。我们通过证明它可以帮助说明跨语言解析器的性能模式来证明结果分析的有用性。

The patterns in which the syntax of different languages converges and diverges are often used to inform work on cross-lingual transfer. Nevertheless, little empirical work has been done on quantifying the prevalence of different syntactic divergences across language pairs. We propose a framework for extracting divergence patterns for any language pair from a parallel corpus, building on Universal Dependencies. We show that our framework provides a detailed picture of cross-language divergences, generalizes previous approaches, and lends itself to full automation. We further present a novel dataset, a manually word-aligned subset of the Parallel UD corpus in five languages, and use it to perform a detailed corpus study. We demonstrate the usefulness of the resulting analysis by showing that it can help account for performance patterns of a cross-lingual parser.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源