论文标题

与异质信息网络的交叉监督联合奖励

Cross-Supervised Joint-Event-Extraction with Heterogeneous Information Networks

论文作者

Wang, Yue, Xu, Zhuo, Bai, Lu, Wan, Yao, Cui, Lixin, Zhao, Qian, Hancock, Edwin R., Yu, Philip S.

论文摘要

从非结构化的现实世界中提取结构信息(即事件的实体或事件的实体或触发器)的联合事件萃取吸引了自然语言处理中的越来越多的研究注意力。大多数现有作品并未完全解决实体和触发器之间稀疏的共发生关系,从而失去了这些重要信息,从而恶化了提取性能。为了减轻此问题,我们首先将联合事件萃取定义为一个由触发器和实体标签组成的标签集的序列到序列标记任务。然后,为了将丢失的信息纳入上述共发生的关系中,我们提出了一种交叉监管机制(CSM),以通过相互分布的类型分布交替监督触发器或实体的提取。此外,由于连接的实体和触发器自然形成了异质信息网络(HIN),因此我们利用沿元路径的潜在模式来进一步提高我们提出的方法的性能。为了验证我们提出的方法的有效性,我们对四个现实世界数据集进行了广泛的实验,并将我们的方法与最新方法进行了比较。经验结果和分析表明,我们的方法的表现优于实体和触发提取的最先进方法。

Joint-event-extraction, which extracts structural information (i.e., entities or triggers of events) from unstructured real-world corpora, has attracted more and more research attention in natural language processing. Most existing works do not fully address the sparse co-occurrence relationships between entities and triggers, which loses this important information and thus deteriorates the extraction performance. To mitigate this issue, we first define the joint-event-extraction as a sequence-to-sequence labeling task with a tag set composed of tags of triggers and entities. Then, to incorporate the missing information in the aforementioned co-occurrence relationships, we propose a Cross-Supervised Mechanism (CSM) to alternately supervise the extraction of either triggers or entities based on the type distribution of each other. Moreover, since the connected entities and triggers naturally form a heterogeneous information network (HIN), we leverage the latent pattern along meta-paths for a given corpus to further improve the performance of our proposed method. To verify the effectiveness of our proposed method, we conduct extensive experiments on four real-world datasets as well as compare our method with state-of-the-art methods. Empirical results and analysis show that our approach outperforms the state-of-the-art methods in both entity and trigger extraction.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源