基于背景知识的多维端到端短语识别算法的研究

论文标题

基于背景知识的多维端到端短语识别算法的研究

Research on multi-dimensional end-to-end phrase recognition algorithm based on background knowledge

论文作者

Li, Zheng, Tu, Gang, Liu, Guang, Zhan, Zhi-Qiang, Liu, Yi-Jian

论文摘要

目前，基于监督学习的深度端到端方法用于实体识别和依赖分析。这种方法有两个问题：首先，无法引入背景知识；其次，无法识别自然语言的多粒度和嵌套特征。为了解决这些问题，提出了基于短语窗口的注释规则，并设计了相应的多维端到端短语识别算法。该注释规则将句子划分为七种类型的嵌套短语，并指示短语之间的依赖性。该算法不仅可以引入背景知识，识别句子中的各种嵌套短语，而且还可以识别短语之间的依赖性。实验结果表明，注释规则易于使用，没有歧义。与传统的端到端算法相比，匹配算法与语法的多粒度和多样性特征更一致。 CPWD数据集的实验，通过引入背景知识，新算法将端到端方法的准确性提高了一个点以上。相应的方法应用于CCL 2018竞赛，并赢得了中国幽默类型识别任务的第一名。

At present, the deep end-to-end method based on supervised learning is used in entity recognition and dependency analysis. There are two problems in this method: firstly, background knowledge cannot be introduced; secondly, multi granularity and nested features of natural language cannot be recognized. In order to solve these problems, the annotation rules based on phrase window are proposed, and the corresponding multi-dimensional end-to-end phrase recognition algorithm is designed. This annotation rule divides sentences into seven types of nested phrases, and indicates the dependency between phrases. The algorithm can not only introduce background knowledge, recognize all kinds of nested phrases in sentences, but also recognize the dependency between phrases. The experimental results show that the annotation rule is easy to use and has no ambiguity; the matching algorithm is more consistent with the multi granularity and diversity characteristics of syntax than the traditional end-to-end algorithm. The experiment on CPWD dataset, by introducing background knowledge, the new algorithm improves the accuracy of the end-to-end method by more than one point. The corresponding method was applied to the CCL 2018 competition and won the first place in the task of Chinese humor type recognition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题