论文标题
宠物:一个带注释的数据集,用于从自然语言文本中提取过程
PET: An Annotated Dataset for Process Extraction from Natural Language Text
论文作者
论文摘要
从文本中提取过程是过程发现的重要任务,近年来已经开发了各种方法。但是,与其他信息提取任务相比,缺乏商务流程描述的金标准库,这些文献对所有感兴趣的实体和关系仔细注释。因此,目前很难以客观的方式比较通过提取方法获得的结果,而缺乏带注释的文本也阻止了数据驱动的信息提取方法的应用,这是自然语言处理领域的典型特征。因此,为了弥合这一差距,我们介绍了PET数据集,这是用活动,网关,参与者和流程信息注释的业务流程描述的第一个语料库。我们介绍了我们的新资源,包括各种基准,以基准从文本中提取业务流程的困难和挑战。可以通过huggingface.co/datasets/patriziobellan/pet访问宠物
Process extraction from text is an important task of process discovery, for which various approaches have been developed in recent years. However, in contrast to other information extraction tasks, there is a lack of gold-standard corpora of business process descriptions that are carefully annotated with all the entities and relationships of interest. Due to this, it is currently hard to compare the results obtained by extraction approaches in an objective manner, whereas the lack of annotated texts also prevents the application of data-driven information extraction methodologies, typical of the natural language processing field. Therefore, to bridge this gap, we present the PET dataset, a first corpus of business process descriptions annotated with activities, gateways, actors, and flow information. We present our new resource, including a variety of baselines to benchmark the difficulty and challenges of business process extraction from text. PET can be accessed via huggingface.co/datasets/patriziobellan/PET