论文标题

边缘位移Vaserstein距离对UD解析性能的影响

The Impact of Edge Displacement Vaserstein Distance on UD Parsing Performance

论文作者

Anderson, Mark, Gómez-Rodríguez, Carlos

论文摘要

我们通过引入一个评估训练和测试数据中看到的边缘位移分布(边缘的定向距离)之间的差异来为NLP中解析性能的讨论做出贡献。我们假设该测量将与树库之间的解析性能中观察到的差异有关。 We motivate this by building upon previous work and then attempt to falsify this hypothesis by using a number of statistical methods.我们确定即使控制潜在的协变量,这种测量和解析性能之间也存在统计相关性。然后,我们使用它来建立一种抽样技术,从而为我们提供对抗性和互补的分裂。这给出了给定树库来代替新鲜采样数据的解析系统的下层和上限。从广义上讲,这里提出的方法可以作为NLP中基于相关的探索工作的参考。

We contribute to the discussion on parsing performance in NLP by introducing a measurement that evaluates the differences between the distributions of edge displacement (the directed distance of edges) seen in training and test data. We hypothesize that this measurement will be related to differences observed in parsing performance across treebanks. We motivate this by building upon previous work and then attempt to falsify this hypothesis by using a number of statistical methods. We establish that there is a statistical correlation between this measurement and parsing performance even when controlling for potential covariants. We then use this to establish a sampling technique that gives us an adversarial and complementary split. This gives an idea of the lower and upper bounds of parsing systems for a given treebank in lieu of freshly sampled data. In a broader sense, the methodology presented here can act as a reference for future correlation-based exploratory work in NLP.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源