通过词汇单位分析对自然语言进行分割

论文标题

通过词汇单位分析对自然语言进行分割

Segmenting Natural Language Sentences via Lexical Unit Analysis

论文作者

Li, Yangming, Liu, Lemao, Shi, Shuming

论文摘要

在这项工作中，我们提出了词汇单元分析（LUA），这是通用序列分割任务的框架。给定自然语言句子，LUA得分所有有效的分割候选者，并利用动态编程（DP）来提取最高评分。 LUA享有许多吸引人的物业，例如固有地保证预测的细分是有效的，并促进了全球最佳培训和推理。此外，LUA的实际时间复杂性可以简化为线性时间，这非常有效。我们已经对5个任务进行了广泛的实验，包括句法块，命名实体识别（NER），插槽填充，中文单词细分和中文的语音部分标记（POS）标记，在15个数据集中进行了标记。我们的模型已经在其中的13个上实现了最新的表演。结果还表明，识别长长段的F1得分显着改善。

In this work, we present Lexical Unit Analysis (LUA), a framework for general sequence segmentation tasks. Given a natural language sentence, LUA scores all the valid segmentation candidates and utilizes dynamic programming (DP) to extract the maximum scoring one. LUA enjoys a number of appealing properties such as inherently guaranteeing the predicted segmentation to be valid and facilitating globally optimal training and inference. Besides, the practical time complexity of LUA can be reduced to linear time, which is very efficient. We have conducted extensive experiments on 5 tasks, including syntactic chunking, named entity recognition (NER), slot filling, Chinese word segmentation, and Chinese part-of-speech (POS) tagging, across 15 datasets. Our models have achieved the state-of-the-art performances on 13 of them. The results also show that the F1 score of identifying long-length segments is notably improved.

下载PDF全文

下载文献需遵守相关版权规定

论文标题