Multi3nlu ++：一种多语言，多字体，多域数据集，用于以任务为导向的对话中的自然语言理解

论文标题

Multi3nlu ++：一种多语言，多字体，多域数据集，用于以任务为导向的对话中的自然语言理解

MULTI3NLU++: A Multilingual, Multi-Intent, Multi-Domain Dataset for Natural Language Understanding in Task-Oriented Dialogue

论文作者

Moghe, Nikita, Razumovskaia, Evgeniia, Guillou, Liane, Vulić, Ivan, Korhonen, Anna, Birch, Alexandra

论文摘要

以任务为导向的对话（TOD）系统在许多行业提供了更有效的客户支持时已广泛部署。这些系统通常是为单个领域或语言构造的，并且不会超出此范围。为了支持跨多种语言和域的自然语言理解（NLU）的工作，我们构建了Multi3nlu ++，这是一种多语言，多语言，多域的多域数据集。 Multi3nlu ++将仅英文NLU ++数据集扩展到两个域（银行和酒店）中，将手动翻译包括在一系列高，中，中低资源语言中（西班牙语，马拉地语，土耳其语和Amharic）。由于具有多重属性，Multi3nlu ++代表了复杂而自然的用户目标，因此使我们能够以各种各样的语言来衡量TOD系统的现实性能。我们使用Multi3nlu ++来基准对多语言设置中TOD系统的意图检测和插槽标签的NLU任务进行基准的多语言模型。结果表明，数据集的挑战性质，尤其是在低资源语言设置中，为多域多语言TOD设置提供了足够的空间，以进行未来的实验。

Task-oriented dialogue (TOD) systems have been widely deployed in many industries as they deliver more efficient customer support. These systems are typically constructed for a single domain or language and do not generalise well beyond this. To support work on Natural Language Understanding (NLU) in TOD across multiple languages and domains simultaneously, we constructed MULTI3NLU++, a multilingual, multi-intent, multi-domain dataset. MULTI3NLU++ extends the English only NLU++ dataset to include manual translations into a range of high, medium, and low resource languages (Spanish, Marathi, Turkish and Amharic), in two domains (BANKING and HOTELS). Because of its multi-intent property, MULTI3NLU++ represents complex and natural user goals, and therefore allows us to measure the realistic performance of TOD systems in a varied set of the world's languages. We use MULTI3NLU++ to benchmark state-of-the-art multilingual models for the NLU tasks of intent detection and slot labelling for TOD systems in the multilingual setting. The results demonstrate the challenging nature of the dataset, particularly in the low-resource language setting, offering ample room for future experimentation in multi-domain multilingual TOD setups.

下载PDF全文

下载文献需遵守相关版权规定

论文标题