半监督的新事件类型感应和描述通过对比损失增强的批处理

论文标题

半监督的新事件类型感应和描述通过对比损失增强的批处理

Semi-supervised New Event Type Induction and Description via Contrastive Loss-Enforced Batch Attention

论文作者

Edwards, Carl, Ji, Heng

论文摘要

传统上，大多数事件提取方法都依赖于注释的事件类型。但是，创建事件本体和注释监督培训数据是昂贵且耗时的。先前的工作提出了半监督的方法，这些方法利用了（注释）类型，以学习如何自动发现新事件类型。半监督或完全无监督的最先进方法在上下文中使用特定令牌的重建损失形式。相比之下，我们提出了一种新型的方法，可以使用掩盖的对比损失进行半监视的新事件类型诱导，该损失通过在数据minibatch上执行注意力机制来学习事件提及之间的相似之处。我们通过近似数据中的基本歧管来进一步解散发现的簇，这使我们能够将归一化的互信息和Fowlkes-mallows得分提高到超过20％的绝对值。在这些聚类结果的基础上，我们将方法扩展到两个新任务：预测发现的簇的类型名称并将其链接到Framenet帧。

Most event extraction methods have traditionally relied on an annotated set of event types. However, creating event ontologies and annotating supervised training data are expensive and time-consuming. Previous work has proposed semi-supervised approaches which leverage seen (annotated) types to learn how to automatically discover new event types. State-of-the-art methods, both semi-supervised or fully unsupervised, use a form of reconstruction loss on specific tokens in a context. In contrast, we present a novel approach to semi-supervised new event type induction using a masked contrastive loss, which learns similarities between event mentions by enforcing an attention mechanism over the data minibatch. We further disentangle the discovered clusters by approximating the underlying manifolds in the data, which allows us to increase normalized mutual information and Fowlkes-Mallows scores by over 20% absolute. Building on these clustering results, we extend our approach to two new tasks: predicting the type name of the discovered clusters and linking them to FrameNet frames.

下载PDF全文

下载文献需遵守相关版权规定

论文标题