培训神经对话模型的受控和知识渊博的电影讨论的语料库

论文标题

培训神经对话模型的受控和知识渊博的电影讨论的语料库

A Corpus of Controlled Opinionated and Knowledgeable Movie Discussions for Training Neural Conversation Models

论文作者

Galetzka, Fabian, Eneh, Chukwuemeka U., Schlangen, David

论文摘要

众所周知，全数据驱动的聊天机器人在转弯中遭受了不一致的行为，这是由于控制参数（例如其假定的背景个性和事实知识）的一般困难而遭受不一致的行为。原因之一是相对缺乏被标记的数据，从中可以从中可以从中从中获得人格一致性和事实用法以及对话行为。为了解决这个问题，我们在电影讨论的领域中介绍了一个新的标记对话数据集，其中每个对话都基于预先指定的事实和观点。我们彻底验证了收集的对话，以遵守参与者的事实和意见概况，并发现这方面的一般质量很高。此过程还为我们提供了一层注释，该注释可能对培训模型有用。我们将其作为基准介绍的端到端训练有素的自发解码器模型，该模型对此数据进行了训练，并表明它能够产生自然而然的响应，这些响应被认为是自然的，知识渊博的，并且表现出了专心。

Fully data driven Chatbots for non-goal oriented dialogues are known to suffer from inconsistent behaviour across their turns, stemming from a general difficulty in controlling parameters like their assumed background personality and knowledge of facts. One reason for this is the relative lack of labeled data from which personality consistency and fact usage could be learned together with dialogue behaviour. To address this, we introduce a new labeled dialogue dataset in the domain of movie discussions, where every dialogue is based on pre-specified facts and opinions. We thoroughly validate the collected dialogue for adherence of the participants to their given fact and opinion profile, and find that the general quality in this respect is high. This process also gives us an additional layer of annotation that is potentially useful for training models. We introduce as a baseline an end-to-end trained self-attention decoder model trained on this data and show that it is able to generate opinionated responses that are judged to be natural and knowledgeable and show attentiveness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题