论文标题
链接技术服务票的神经实体
Neural Entity Linking on Technical Service Tickets
论文作者
论文摘要
链接的实体是将文本提及映射到已知实体的任务,最近已使用上下文化的神经网络解决了。我们解决了一个问题,这些结果是否为Wikipedia等大型高质量数据集进行了报告 - 转移到实用的业务用例中,标签稀缺,文本是低质量的,并且术语特定于领域。使用基于BERT的实体链接模型,BERT是自然语言处理中流行的变压器网络,我们表明,神经方法的表现优于手工编码的启发式方法,提高了20%的TOP-1准确性。同样,证明了在大型语料库上进行转移学习的好处,而微调很难。最后,我们比较了不同的基于BERT的体系结构,并表明一个简单的句子编码(BI-编码器)在实践中提供了快速而有效的搜索。
Entity linking, the task of mapping textual mentions to known entities, has recently been tackled using contextualized neural networks. We address the question whether these results -- reported for large, high-quality datasets such as Wikipedia -- transfer to practical business use cases, where labels are scarce, text is low-quality, and terminology is highly domain-specific. Using an entity linking model based on BERT, a popular transformer network in natural language processing, we show that a neural approach outperforms and complements hand-coded heuristics, with improvements of about 20% top-1 accuracy. Also, the benefits of transfer learning on a large corpus are demonstrated, while fine-tuning proves difficult. Finally, we compare different BERT-based architectures and show that a simple sentence-wise encoding (Bi-Encoder) offers a fast yet efficient search in practice.