迈向现实的低资源关系提取：经验基线研究的基准

论文标题

迈向现实的低资源关系提取：经验基线研究的基准

Towards Realistic Low-resource Relation Extraction: A Benchmark with Empirical Baseline Study

论文作者

Xu, Xin, Chen, Xiang, Zhang, Ningyu, Xie, Xin, Chen, Xi, Chen, Huajun

论文摘要

本文提出了一项经验研究，以在低资源环境中建立关系提取系统。基于最新的预训练语言模型，我们全面研究了三个方案，以评估低资源设置中的性能：（i）不同类型的基于及时的及时方法的方法，具有很少的标记数据；（ii）解决长尾分配问题的各种平衡方法；（iii）数据增强技术和自训练以生成更标记的内域数据。我们创建了一个基准，其中包含8个关系提取（RE）数据集，涵盖了不同的语言，域和上下文，并对与组合的建议方案进行了广泛的比较。我们的实验说明了：（i）尽管基于及时的调整在低资源中是有益的，但仍有很大的改进潜力，尤其是从具有多个关系三元三元的跨句子环境中提取关系；（ii）平衡方法并不总是有助于与长尾巴分布的RE；（iii）数据增强可以补充现有的基准，并且可以带来很大的绩效增长，而自我训练可能并不能始终如一地实现低资源的进步。代码和数据集在https://github.com/zjunlp/lrebench中。

This paper presents an empirical study to build relation extraction systems in low-resource settings. Based upon recent pre-trained language models, we comprehensively investigate three schemes to evaluate the performance in low-resource settings: (i) different types of prompt-based methods with few-shot labeled data; (ii) diverse balancing methods to address the long-tailed distribution issue; (iii) data augmentation technologies and self-training to generate more labeled in-domain data. We create a benchmark with 8 relation extraction (RE) datasets covering different languages, domains and contexts and perform extensive comparisons over the proposed schemes with combinations. Our experiments illustrate: (i) Though prompt-based tuning is beneficial in low-resource RE, there is still much potential for improvement, especially in extracting relations from cross-sentence contexts with multiple relational triples; (ii) Balancing methods are not always helpful for RE with long-tailed distribution; (iii) Data augmentation complements existing baselines and can bring much performance gain, while self-training may not consistently achieve advancement to low-resource RE. Code and datasets are in https://github.com/zjunlp/LREBench.

下载PDF全文

下载文献需遵守相关版权规定

论文标题