利用专家指导的对抗性增强来改善指定实体识别的概括

论文标题

利用专家指导的对抗性增强来改善指定实体识别的概括

Leveraging Expert Guided Adversarial Augmentation For Improving Generalization in Named Entity Recognition

论文作者

Reich, Aaron, Chen, Jiaao, Agrawal, Aastha, Zhang, Yanzhe, Yang, Diyi

论文摘要

命名实体识别（NER）系统通常在分布数据上表现出很高的性能，但在从变化的分布中得出的示例中表现不佳。评估NER模型的概括能力的一种方法是使用对抗性示例，很少考虑与命名实体相关的特定变化。为此，我们建议利用专家指导的启发式方法来改变实体令牌及其周围环境，从而改变其实体类型作为对抗性攻击。使用专家指导的启发式方法，我们增强了2003年CONLL测试集并手动注释以构建高质量的挑战集。我们发现，在CONLL 2003培训数据下降的培训数据下，在我们具有挑战性的设置上进行了培训的最先进的系统。通过对对抗性增强训练示例的培训并使用混合训练进行正规化，我们能够显着提高挑战集的性能，并改善室外概括，并通过使用Ontonotes数据进行评估。我们已在https://github.com/gt-salt/guided-versarial-augmentation上公开发布了我们的数据集和代码。

Named Entity Recognition (NER) systems often demonstrate great performance on in-distribution data, but perform poorly on examples drawn from a shifted distribution. One way to evaluate the generalization ability of NER models is to use adversarial examples, on which the specific variations associated with named entities are rarely considered. To this end, we propose leveraging expert-guided heuristics to change the entity tokens and their surrounding contexts thereby altering their entity types as adversarial attacks. Using expert-guided heuristics, we augmented the CoNLL 2003 test set and manually annotated it to construct a high-quality challenging set. We found that state-of-the-art NER systems trained on CoNLL 2003 training data drop performance dramatically on our challenging set. By training on adversarial augmented training examples and using mixup for regularization, we were able to significantly improve the performance on the challenging set as well as improve out-of-domain generalization which we evaluated by using OntoNotes data. We have publicly released our dataset and code at https://github.com/GT-SALT/Guided-Adversarial-Augmentation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题