论文标题

将不确定的细分信息纳入社交媒体文本中文中

Incorporating Uncertain Segmentation Information into Chinese NER for Social Media Text

论文作者

Jia, Shengbin, Ding, Ling, Chen, Xiaojun, E, Shijia, Xiang, Yang

论文摘要

中文单词细分对于为中文命名实体识别(NER)系统提供单词级信息是必要的。但是,在处理诸如社交媒体文本(社交媒体文本)时,分割错误传播是中文NER的挑战。在本文中,我们提出了一个模型(UICWSNN),该模型专门从中国社交媒体文本中识别实体,尤其是通过利用单词分割的模棱两可的信息。这种不确定的信息包含句子的所有潜在分割状态,该句子为模型提供了推断深层字级特征的通道。我们提出了三部曲(即候选位置嵌入 - >位置选择性注意 - >自适应单词卷积)来编码不确定的单词分割信息并获取适当的单词级表示。社交媒体语料库的实验结果表明,我们的模型可以有效地减轻分割错误的级联麻烦,并在以前的最新方法中实现了超过2%的绩效提高。

Chinese word segmentation is necessary to provide word-level information for Chinese named entity recognition (NER) systems. However, segmentation error propagation is a challenge for Chinese NER while processing colloquial data like social media text. In this paper, we propose a model (UIcwsNN) that specializes in identifying entities from Chinese social media text, especially by leveraging ambiguous information of word segmentation. Such uncertain information contains all the potential segmentation states of a sentence that provides a channel for the model to infer deep word-level characteristics. We propose a trilogy (i.e., candidate position embedding -> position selective attention -> adaptive word convolution) to encode uncertain word segmentation information and acquire appropriate word-level representation. Experiments results on the social media corpus show that our model alleviates the segmentation error cascading trouble effectively, and achieves a significant performance improvement of more than 2% over previous state-of-the-art methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源