论文标题
利用多个参考图用于半监督关系提取
Exploit Multiple Reference Graphs for Semi-supervised Relation Extraction
论文作者
论文摘要
标记数据提取的标记数据的手动注释是耗时的和劳动力密集的。半监督的方法可以为这个问题提供帮助,并引起了巨大的研究兴趣。现有的工作重点是将未标记的样本映射到类中,以增强标记的数据集。但是,很难找到一个整体良好的映射功能,特别是对于一个句子中具有复杂句法成分的样品。 为了应对此限制,我们建议在未标记的数据和标记的数据之间建立连接,而不是将未标记的样本直接映射到类中。具体而言,我们首先使用三种信息来构建参考图,包括实体参考,动词参考和语义参考。目标是将未标记的样品(s)连接到标记的一个。然后,我们开发一个多参考图(MREFG)模型,以利用参考信息,以更好地识别高质量的未标记样本。我们方法的有效性通过与两个公共数据集的最新基线进行了广泛的比较实验来证明。
Manual annotation of the labeled data for relation extraction is time-consuming and labor-intensive. Semi-supervised methods can offer helping hands for this problem and have aroused great research interests. Existing work focuses on mapping the unlabeled samples to the classes to augment the labeled dataset. However, it is hard to find an overall good mapping function, especially for the samples with complicated syntactic components in one sentence. To tackle this limitation, we propose to build the connection between the unlabeled data and the labeled ones rather than directly mapping the unlabeled samples to the classes. Specifically, we first use three kinds of information to construct reference graphs, including entity reference, verb reference, and semantics reference. The goal is to semantically or lexically connect the unlabeled sample(s) to the labeled one(s). Then, we develop a Multiple Reference Graph (MRefG) model to exploit the reference information for better recognizing high-quality unlabeled samples. The effectiveness of our method is demonstrated by extensive comparison experiments with the state-of-the-art baselines on two public datasets.