日内瓦：具有数百种事件类型和参数角色的事件参数提取的概括性基准。

论文标题

日内瓦：具有数百种事件类型和参数角色的事件参数提取的概括性基准。

GENEVA: Benchmarking Generalizability for Event Argument Extraction with Hundreds of Event Types and Argument Roles

论文作者

Parekh, Tanmay, Hsu, I-Hung, Huang, Kuan-Hao, Chang, Kai-Wei, Peng, Nanyun

论文摘要

事件参数提取（EAE）的最新作品重点是改善模型的推广性，以迎合新事件和域。但是，诸如ACE和ERE之类的标准基准数据集涵盖了少于40种事件类型和25种以实体为中心的参数角色。有限的多样性和覆盖范围阻碍了这些数据集充分评估EAE模型的普遍性。在本文中，我们首先是通过创建大型的EAE本体论来做出贡献。该本体论是通过转换Framenet（用于EAE的全面语义角色标签（SRL）数据集）来创建的，它通过利用这两个任务之间的相似性来创建。然后，收集了详尽的人类专家注释来构建本体论，以115个事件和220个论点角色结束，其中很大一部分角色不是实体。我们利用此本体来进一步介绍日内瓦，这是一种多样化的可推广性基准数据集，其中包括四个测试套件，旨在评估模型处理有限数据和看不见的事件类型概括的能力。我们基准了来自各个家庭的六个EAE模型。结果表明，由于非实现参数角色，即使是表现最佳的模型也只能达到39％的F1分数，这表明日内瓦如何为EAE的概括提供新的挑战。总体而言，我们庞大而多样化的EAE本体论可以帮助创造更全面的未来资源，而日内瓦是一个具有挑战性的基准数据集，可鼓励进一步的研究以改善EAE的可推广性。可以在https://github.com/pluslabnlp/geneva上找到代码和数据。

Recent works in Event Argument Extraction (EAE) have focused on improving model generalizability to cater to new events and domains. However, standard benchmarking datasets like ACE and ERE cover less than 40 event types and 25 entity-centric argument roles. Limited diversity and coverage hinder these datasets from adequately evaluating the generalizability of EAE models. In this paper, we first contribute by creating a large and diverse EAE ontology. This ontology is created by transforming FrameNet, a comprehensive semantic role labeling (SRL) dataset for EAE, by exploiting the similarity between these two tasks. Then, exhaustive human expert annotations are collected to build the ontology, concluding with 115 events and 220 argument roles, with a significant portion of roles not being entities. We utilize this ontology to further introduce GENEVA, a diverse generalizability benchmarking dataset comprising four test suites, aimed at evaluating models' ability to handle limited data and unseen event type generalization. We benchmark six EAE models from various families. The results show that owing to non-entity argument roles, even the best-performing model can only achieve 39% F1 score, indicating how GENEVA provides new challenges for generalization in EAE. Overall, our large and diverse EAE ontology can aid in creating more comprehensive future resources, while GENEVA is a challenging benchmarking dataset encouraging further research for improving generalizability in EAE. The code and data can be found at https://github.com/PlusLabNLP/GENEVA.

下载PDF全文

下载文献需遵守相关版权规定

论文标题