论文标题
Mulco:通过多个范围识别中国嵌套命名实体
Mulco: Recognizing Chinese Nested Named Entities Through Multiple Scopes
论文作者
论文摘要
嵌套命名的实体识别(NNER)是研究人员作为指定实体识别的重要子区域的长期挑战。 NNER是一个实体可能是一个较长实体的一部分的地方,这可能会在多个层面上发生,如嵌套一词所暗示的那样。这些嵌套结构使传统的序列标记方法无法正确识别所有实体。尽管最近的研究重点是在多种语言中为NNER设计更好的识别方法,但中国NNER(CNNER)仍然缺乏关注,在那里没有自由访问,CNNER,CNNER主题化的基准测试。在本文中,我们旨在通过提供中国数据集和基于学习的模型来解决问题来解决CNNER问题。为了促进对这项任务的研究,我们发布了中文,这是一个CNNER数据集,其中有20,000个句子从多个域的在线段落中取样,其中包含117,284个实体,其中有10个类别失败,其中43.8%的实体被嵌套。基于中文,我们提出了一种新型方法,可以通过多个范围识别嵌套结构中的命名实体。每个范围都使用设计的基于示波器的序列标记方法,该方法可以预测锚定和命名实体的长度来识别它。实验结果表明,Mulco的表现优于几种基线方法,其中文方案的识别方案不同。我们还对ACE2005中国语料库进行了广泛的实验,与基线方法相比,Mulco取得了最佳性能。
Nested Named Entity Recognition (NNER) has been a long-term challenge to researchers as an important sub-area of Named Entity Recognition. NNER is where one entity may be part of a longer entity, and this may happen on multiple levels, as the term nested suggests. These nested structures make traditional sequence labeling methods unable to properly recognize all entities. While recent researches focus on designing better recognition methods for NNER in a variety of languages, the Chinese NNER (CNNER) still lacks attention, where a free-for-access, CNNER-specialized benchmark is absent. In this paper, we aim to solve CNNER problems by providing a Chinese dataset and a learning-based model to tackle the issue. To facilitate the research on this task, we release ChiNesE, a CNNER dataset with 20,000 sentences sampled from online passages of multiple domains, containing 117,284 entities failing in 10 categories, where 43.8 percent of those entities are nested. Based on ChiNesE, we propose Mulco, a novel method that can recognize named entities in nested structures through multiple scopes. Each scope use a designed scope-based sequence labeling method, which predicts an anchor and the length of a named entity to recognize it. Experiment results show that Mulco has outperformed several baseline methods with the different recognizing schemes on ChiNesE. We also conduct extensive experiments on ACE2005 Chinese corpus, where Mulco has achieved the best performance compared with the baseline methods.