论文标题

为法语命名实体识别建立新的最新技术

Establishing a New State-of-the-Art for French Named Entity Recognition

论文作者

Suárez, Pedro Javier Ortiz, Dupont, Yoann, Muller, Benjamin, Romary, Laurent, Sagot, Benoît

论文摘要

在巴黎大学开发的法国树仓是法语的形态句法和句法注释的主要来源。但是,它不包括与命名实体有关的明确信息,这些信息是几种自然语言处理任务和应用程序中最有用的信息之一。此外,没有带有命名实体注释的法国语料库包含参考信息,这些信息补充了每个提及的类型和跨度,并指示其所指的实体。在自动退通步骤之后,我们已经用这样的信息手动注释了法国树库。我们绘制了基本注释指南,并提供了一些有关结果注释的数字。

The French TreeBank developed at the University Paris 7 is the main source of morphosyntactic and syntactic annotations for French. However, it does not include explicit information related to named entities, which are among the most useful information for several natural language processing tasks and applications. Moreover, no large-scale French corpus with named entity annotations contain referential information, which complement the type and the span of each mention with an indication of the entity it refers to. We have manually annotated the French TreeBank with such information, after an automatic pre-annotation step. We sketch the underlying annotation guidelines and we provide a few figures about the resulting annotations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源