论文标题

迈向现实的低资源关系提取:经验基线研究的基准

Towards Realistic Low-resource Relation Extraction: A Benchmark with Empirical Baseline Study

论文作者

Xu, Xin, Chen, Xiang, Zhang, Ningyu, Xie, Xin, Chen, Xi, Chen, Huajun

论文摘要

本文提出了一项经验研究,以在低资源环境中建立关系提取系统。基于最新的预训练语言模型,我们全面研究了三个方案,以评估低资源设置中的性能:(i)不同类型的基于及时的及时方法的方法,具有很少的标记数据; (ii)解决长尾分配问题的各种平衡方法; (iii)数据增强技术和自训练以生成更标记的内域数据。我们创建了一个基准,其中包含8个关系提取(RE)数据集,涵盖了不同的语言,域和上下文,并对与组合的建议方案进行了广泛的比较。我们的实验说明了:(i)尽管基于及时的调整在低资源中是有益的,但仍有很大的改进潜力,尤其是从具有多个关系三元三元的跨句子环境中提取关系; (ii)平衡方法并不总是有助于与长尾巴分布的RE; (iii)数据增强可以补充现有的基准,并且可以带来很大的绩效增长,而自我训练可能并不能始终如一地实现低资源的进步。代码和数据集在https://github.com/zjunlp/lrebench中。

This paper presents an empirical study to build relation extraction systems in low-resource settings. Based upon recent pre-trained language models, we comprehensively investigate three schemes to evaluate the performance in low-resource settings: (i) different types of prompt-based methods with few-shot labeled data; (ii) diverse balancing methods to address the long-tailed distribution issue; (iii) data augmentation technologies and self-training to generate more labeled in-domain data. We create a benchmark with 8 relation extraction (RE) datasets covering different languages, domains and contexts and perform extensive comparisons over the proposed schemes with combinations. Our experiments illustrate: (i) Though prompt-based tuning is beneficial in low-resource RE, there is still much potential for improvement, especially in extracting relations from cross-sentence contexts with multiple relational triples; (ii) Balancing methods are not always helpful for RE with long-tailed distribution; (iii) Data augmentation complements existing baselines and can bring much performance gain, while self-training may not consistently achieve advancement to low-resource RE. Code and datasets are in https://github.com/zjunlp/LREBench.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源