论文标题

低资源的软地名词典命名为实体识别

Soft Gazetteers for Low-Resource Named Entity Recognition

论文作者

Rijhwani, Shruti, Zhou, Shuyan, Neubig, Graham, Carbonell, Jaime

论文摘要

传统命名实体识别模型使用Gazetteers(实体列表)作为提高性能的功能。尽管现代的神经网络模型不需要此类手工制作的功能来进行出色的性能,但最近的工作证明了他们在英语数据上命名实体识别的实用程序。但是,为低资源语言设计此类功能是具有挑战性的,因为这些语言中不存在详尽的实体地名。为了解决这个问题,我们提出了一种“软地名词剂”的方法,该方法将来自英语知识库(例如Wikipedia)的普遍存在的信息通过跨语化实体链接到神经命名的实体识别模型中。我们对四种低资源语言的实验表明,F1分数的平均提高了4分。代码和数据可在https://github.com/neulab/soft-gazetteers上找到。

Traditional named entity recognition models use gazetteers (lists of entities) as features to improve performance. Although modern neural network models do not require such hand-crafted features for strong performance, recent work has demonstrated their utility for named entity recognition on English data. However, designing such features for low-resource languages is challenging, because exhaustive entity gazetteers do not exist in these languages. To address this problem, we propose a method of "soft gazetteers" that incorporates ubiquitously available information from English knowledge bases, such as Wikipedia, into neural named entity recognition models through cross-lingual entity linking. Our experiments on four low-resource languages show an average improvement of 4 points in F1 score. Code and data are available at https://github.com/neulab/soft-gazetteers.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源