查询分辨率进行对话搜索，并有限的监督

论文标题

查询分辨率进行对话搜索，并有限的监督

Query Resolution for Conversational Search with Limited Supervision

论文作者

Voskarides, Nikos, Li, Dan, Ren, Pengjie, Kanoulas, Evangelos, de Rijke, Maarten

论文摘要

在这项工作中，我们将重点放在多转弯段落中，这是对话搜索的关键组成部分。多转弯段落检索中的主要挑战之一是由于当前的转弯查询通常是由于零图案，主题更改或主题返回而被指定的。对话历史记录中的上下文可用于更好地表达当前转弯查询的表达，该查询定义为查询分辨率的任务。在本文中，我们将查询分辨率任务建模为二进制术语分类问题：对于在对话的上一圈中出现的每个术语决定是否将其添加到当前转弯查询中。我们提出了基于双向变压器的神经查询分辨率模型的Quretec（按学期分类进行查询分辨率）。我们提出了一种遥远的监督方法，以使用查询相关性标签自动生成培训数据。此类标签通常可以在集合中作为人类注释或从用户互动中推断出来。我们表明，Quretec优于最先进的模型，此外，我们遥远的监督方法可用于大大减少训练Quretec所需的人类策划数据的量。我们将Quretec纳入了多阶段的多阶段段落检索体系结构中，并在TREC铸造数据集中演示了其有效性。

In this work we focus on multi-turn passage retrieval as a crucial component of conversational search. One of the key challenges in multi-turn passage retrieval comes from the fact that the current turn query is often underspecified due to zero anaphora, topic change, or topic return. Context from the conversational history can be used to arrive at a better expression of the current turn query, defined as the task of query resolution. In this paper, we model the query resolution task as a binary term classification problem: for each term appearing in the previous turns of the conversation decide whether to add it to the current turn query or not. We propose QuReTeC (Query Resolution by Term Classification), a neural query resolution model based on bidirectional transformers. We propose a distant supervision method to automatically generate training data by using query-passage relevance labels. Such labels are often readily available in a collection either as human annotations or inferred from user interactions. We show that QuReTeC outperforms state-of-the-art models, and furthermore, that our distant supervision method can be used to substantially reduce the amount of human-curated data required to train QuReTeC. We incorporate QuReTeC in a multi-turn, multi-stage passage retrieval architecture and demonstrate its effectiveness on the TREC CAsT dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题