论文标题
特纳:中文NER的基于不确定性的检索框架
TURNER: The Uncertainty-based Retrieval Framework for Chinese NER
论文作者
论文摘要
由于汉字的歧义和单词边界的缺乏,中文是一项艰巨的任务。先前关于中国NER的工作重点是基于词典的方法,以引入边界信息并减少预测期间的销量销量(OOV)案例。但是,在特定领域中获取并动态维护高质量词典是昂贵的,这激发了我们利用更多的通用知识资源,例如搜索引擎。在本文中,我们提出了Turner:中文NER的基于不确定性的检索框架。特纳背后的想法是模仿人类的行为:当遇到一个未知或不确定的实体时,我们经常将辅助知识作为帮助。为了提高检索的效率和有效性,我们首先提出了两种类型的不确定性抽样方法,以选择输入文本的最模棱两可的实体级不确定组件。然后,知识融合模型通过结合检索知识来重新预测不确定的样本。四个基准数据集的实验证明了特纳的有效性。特纳(Turner)的表现优于现有的基于词典的方法,并实现了新的SOTA。
Chinese NER is a difficult undertaking due to the ambiguity of Chinese characters and the absence of word boundaries. Previous work on Chinese NER focus on lexicon-based methods to introduce boundary information and reduce out-of-vocabulary (OOV) cases during prediction. However, it is expensive to obtain and dynamically maintain high-quality lexicons in specific domains, which motivates us to utilize more general knowledge resources, e.g., search engines. In this paper, we propose TURNER: The Uncertainty-based Retrieval framework for Chinese NER. The idea behind TURNER is to imitate human behavior: we frequently retrieve auxiliary knowledge as assistance when encountering an unknown or uncertain entity. To improve the efficiency and effectiveness of retrieval, we first propose two types of uncertainty sampling methods for selecting the most ambiguous entity-level uncertain components of the input text. Then, the Knowledge Fusion Model re-predict the uncertain samples by combining retrieved knowledge. Experiments on four benchmark datasets demonstrate TURNER's effectiveness. TURNER outperforms existing lexicon-based approaches and achieves the new SOTA.