从未标记的文档中生成信息寻求对话

论文标题

从未标记的文档中生成信息寻求对话

Generating Information-Seeking Conversations from Unlabeled Documents

论文作者

Kim, Gangwoo, Kim, Sungdong, Yoo, Kang Min, Kang, Jaewoo

论文摘要

在本文中，我们介绍了一个新颖的框架Simseek（模拟未标记文档中的信息寻求对话），并比较其两个变体。在我们的基线simseek-sym中，一个发问者在答案者的预定答案上产生了后续问题。相反，Simseek-Asym首先生成问题，然后在对话环境下找到其相应的答案。我们的实验表明，它们可以合成CQA和对话搜索任务的有效培训资源。结果，Simseek-Asym的对话不仅在我们的实验中取得了更多改进，而且对人类评估进行了有利的审查。我们终于发布了合成对话的大规模资源Wiki-Simseek，其中包含200万个CQA对建立在Wikipedia文档上。使用数据集，我们的CQA模型在最近的CQA基准测试中实现了最先进的性能。

In this paper, we introduce a novel framework, SIMSEEK, (Simulating information-Seeking conversation from unlabeled documents), and compare its two variants. In our baseline SIMSEEK-SYM, a questioner generates follow-up questions upon the predetermined answer by an answerer. On the contrary, SIMSEEK-ASYM first generates the question and then finds its corresponding answer under the conversational context. Our experiments show that they can synthesize effective training resources for CQA and conversational search tasks. As a result, conversations from SIMSEEK-ASYM not only make more improvements in our experiments but also are favorably reviewed in a human evaluation. We finally release a large-scale resource of synthetic conversations, WIKI-SIMSEEK, containing 2 million CQA pairs built upon Wikipedia documents. With the dataset, our CQA model achieves state-of-the-art performance on a recent CQA benchmark, QuAC.

下载PDF全文

下载文献需遵守相关版权规定

论文标题