学习检索：如何有效，有效地训练密集的检索模型

论文标题

学习检索：如何有效，有效地训练密集的检索模型

Learning To Retrieve: How to Train a Dense Retrieval Model Effectively and Efficiently

论文作者

Zhan, Jingtao, Mao, Jiaxin, Liu, Yiqun, Zhang, Min, Ma, Shaoping

论文摘要

排名一直是信息检索研究中最关注的问题之一。几十年来，词汇匹配信号一直主导着临时检索过程，但它也具有固有的缺陷，例如词汇不匹配问题。最近，提议通过捕获查询和文档之间的深厚语义关系来减轻这些局限性，以减轻这些限制。对大多数现有密集检索模型的培训依赖于从语料库中取样负面实例来优化成对损耗函数。通过调查，我们发现这种培训策略是有偏见的，并且无法有效，有效地优化完整的检索性能。为了解决这个问题，我们建议学习检索（LTRE）培训技术。 LTRE事先构建文档索引。在每个训练迭代中，它执行完整的检索而无需负采样，然后更新查询表示模型参数。通过此过程，它教授了博士模型如何从整个语料库中检索相关文档，而不是如何重新读取潜在偏见的文档样本。通过段落检索和文件检索任务的实验表明：1）在有效性方面，LTRE的表现明显优于所有竞争性稀疏和密集的基线。在合理的延迟限制下，它甚至比BM25-bert级联系统的性能更好。 2）在训练效率方面，与先前的最先进的DR方法相比，LTRE在训练过程中提供了超过170倍的速度。用压缩索引进行培训进一步节省了计算资源，并以较小的性能损失。

Ranking has always been one of the top concerns in information retrieval research. For decades, lexical matching signal has dominated the ad-hoc retrieval process, but it also has inherent defects, such as the vocabulary mismatch problem. Recently, Dense Retrieval (DR) technique has been proposed to alleviate these limitations by capturing the deep semantic relationship between queries and documents. The training of most existing Dense Retrieval models relies on sampling negative instances from the corpus to optimize a pairwise loss function. Through investigation, we find that this kind of training strategy is biased and fails to optimize full retrieval performance effectively and efficiently. To solve this problem, we propose a Learning To Retrieve (LTRe) training technique. LTRe constructs the document index beforehand. At each training iteration, it performs full retrieval without negative sampling and then updates the query representation model parameters. Through this process, it teaches the DR model how to retrieve relevant documents from the entire corpus instead of how to rerank a potentially biased sample of documents. Experiments in both passage retrieval and document retrieval tasks show that: 1) in terms of effectiveness, LTRe significantly outperforms all competitive sparse and dense baselines. It even gains better performance than the BM25-BERT cascade system under reasonable latency constraints. 2) in terms of training efficiency, compared with the previous state-of-the-art DR method, LTRe provides more than 170x speed-up in the training process. Training with a compressed index further saves computing resources with minor performance loss.

下载PDF全文

下载文献需遵守相关版权规定

论文标题