论文标题

一种简单有效的方法,用于强大的无监督双语词典归纳

A Simple and Effective Approach to Robust Unsupervised Bilingual Dictionary Induction

论文作者

Li, Yanyang, Luo, Yingfeng, Lin, Ye, Du, Quan, Wang, Huizhen, Huang, Shujian, Xiao, Tong, Zhu, Jingbo

论文摘要

基于初始化和自学的无监督双语词典归纳方法在类似的语言对中取得了巨大的成功,例如英语 - 西班牙语。但是它们仍然失败,并且在许多遥远的语言对中的准确性为0%,例如英语 - 日本。在这项工作中,我们表明,这种故障是由于实际初始化性能与自学成功的最小初始化性能之间的差距。我们提出迭代维度缩小以弥合这一差距。我们的实验表明,这种简单的方法不会妨碍类似语言对的性能,并且在英语和四种遥远的语言之间,即中文,日语,越南和泰语之间的精度为13.64〜55.53%。

Unsupervised Bilingual Dictionary Induction methods based on the initialization and the self-learning have achieved great success in similar language pairs, e.g., English-Spanish. But they still fail and have an accuracy of 0% in many distant language pairs, e.g., English-Japanese. In this work, we show that this failure results from the gap between the actual initialization performance and the minimum initialization performance for the self-learning to succeed. We propose Iterative Dimension Reduction to bridge this gap. Our experiments show that this simple method does not hamper the performance of similar language pairs and achieves an accuracy of 13.64~55.53% between English and four distant languages, i.e., Chinese, Japanese, Vietnamese and Thai.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源