论文标题

NLU ++:在以任务为导向的对话中的自然语言理解的多标签,插槽富含插槽,可通用的数据集

NLU++: A Multi-Label, Slot-Rich, Generalisable Dataset for Natural Language Understanding in Task-Oriented Dialogue

论文作者

Casanueva, Iñigo, Vulić, Ivan, Spithourakis, Georgios P., Budzianowski, Paweł

论文摘要

我们介绍了NLU ++,这是一种新颖的数据集,用于以任务为导向的对话(TOD)系统中的自然语言理解(NLU),其目的是为对话NLU模型提供更具挑战性的评估环境,并符合当前的应用程序和行业要求。 NLU ++分为两个域(银行和酒店),对当前常用的NLU数据集进行了一些至关重要的改进。 1)NLU ++提供了精细的域本体,并具有大量具有挑战性的多句话,引入和验证了意图模块的概念,这些模块可以结合到复杂的意图中,这些意图可以传达复杂的用户目标,并结合了较为元素的粒度,因此更具挑战性的插槽集。 2)本体学分为域特异性和通用(即域 - 普及)意图模块,这些模块在跨域重叠,从而促进了带注释的示例的跨域可重复使用。 3)数据集设计的启发是受工业TOD系统中观察到的问题的启发,4)已通过对话NLU专家收集,过滤和仔细注释它,从而产生高质量的注释数据。最后,我们基准了NLU ++上的一系列当前最新NLU模型。结果表明,数据集的挑战性质,尤其是在低数据制度,“意图模块化”的有效性,并呼吁对Tod NLU进行进一步研究。

We present NLU++, a novel dataset for natural language understanding (NLU) in task-oriented dialogue (ToD) systems, with the aim to provide a much more challenging evaluation environment for dialogue NLU models, up to date with the current application and industry requirements. NLU++ is divided into two domains (BANKING and HOTELS) and brings several crucial improvements over current commonly used NLU datasets. 1) NLU++ provides fine-grained domain ontologies with a large set of challenging multi-intent sentences, introducing and validating the idea of intent modules that can be combined into complex intents that convey complex user goals, combined with finer-grained and thus more challenging slot sets. 2) The ontology is divided into domain-specific and generic (i.e., domain-universal) intent modules that overlap across domains, promoting cross-domain reusability of annotated examples. 3) The dataset design has been inspired by the problems observed in industrial ToD systems, and 4) it has been collected, filtered and carefully annotated by dialogue NLU experts, yielding high-quality annotated data. Finally, we benchmark a series of current state-of-the-art NLU models on NLU++; the results demonstrate the challenging nature of the dataset, especially in low-data regimes, the validity of `intent modularisation', and call for further research on ToD NLU.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源