开放域问题回答通过问题重写进行对话

论文标题

开放域问题回答通过问题重写进行对话

Open-Domain Question Answering Goes Conversational via Question Rewriting

论文作者

Anantha, Raviteja, Vakulenko, Svitlana, Tu, Zhucheng, Longpre, Shayne, Pulman, Stephen, Chappidi, Srinivas

论文摘要

我们在对话上下文（QRECC）中介绍了一个新的数据集，该数据集包含14K对话，并与80k Question-Asswer对。 QRECC的任务是在10m网页集合（分为54m段落）中找到对话问题的答案。同一对话中问题的答案可以在几个网页上分发。 QRECC提供了注释，使我们能够培训和评估问题重写，通过检索和阅读理解的各个子任务的单个子任务。我们报告了一种强大的基线方法的有效性，该方法结合了问题重写的最新模型和开放域质量检查的竞争模型。与75.45的人类上限相比，我们的结果为QRECC数据集的第一个基线设定了F1的基线，这表明设置的难度和大量改进的空间。

We introduce a new dataset for Question Rewriting in Conversational Context (QReCC), which contains 14K conversations with 80K question-answer pairs. The task in QReCC is to find answers to conversational questions within a collection of 10M web pages (split into 54M passages). Answers to questions in the same conversation may be distributed across several web pages. QReCC provides annotations that allow us to train and evaluate individual subtasks of question rewriting, passage retrieval and reading comprehension required for the end-to-end conversational question answering (QA) task. We report the effectiveness of a strong baseline approach that combines the state-of-the-art model for question rewriting, and competitive models for open-domain QA. Our results set the first baseline for the QReCC dataset with F1 of 19.10, compared to the human upper bound of 75.45, indicating the difficulty of the setup and a large room for improvement.

下载PDF全文

下载文献需遵守相关版权规定

论文标题