论文标题

更快地学习的稀疏检索和带导游

Faster Learned Sparse Retrieval with Guided Traversal

论文作者

Mallia, Antonio, Mackenzie, Joel, Suel, Torsten, Tonellotto, Nicola

论文摘要

基于变压器(例如BERT)的神经信息检索架构能够显着提高传统稀疏模型(例如BM25)的系统效率。尽管非常有效,但这些神经方法的运行非常昂贵,因此在严格的延迟限制下很难部署它们。为了解决这一局限性,最近的研究提出了新的稀疏模型的新家庭,这些模型试图符合学习的密集模型的有效性,同时利用传统的倒置索引数据结构来提高效率。当前学到的稀疏模型学习文档中的术语权重,有时甚至是查询。但是,他们利用不同的词汇结构,文档扩展技术和查询扩展策略,这比传统的稀疏模型(例如BM25)可以使它们慢。在这项工作中,我们提出了一种新颖的索引和查询处理技术,该技术利用了传统的稀疏模型的“指导”,以有效地穿越索引,从而使更有效的学习模型可以执行更少的评分操作。我们的实验表明,我们的引导加工启发式方法能够提高基本学到的稀疏模型的效率四个,而无需进行任何可测量的有效性丧失。

Neural information retrieval architectures based on transformers such as BERT are able to significantly improve system effectiveness over traditional sparse models such as BM25. Though highly effective, these neural approaches are very expensive to run, making them difficult to deploy under strict latency constraints. To address this limitation, recent studies have proposed new families of learned sparse models that try to match the effectiveness of learned dense models, while leveraging the traditional inverted index data structure for efficiency. Current learned sparse models learn the weights of terms in documents and, sometimes, queries; however, they exploit different vocabulary structures, document expansion techniques, and query expansion strategies, which can make them slower than traditional sparse models such as BM25. In this work, we propose a novel indexing and query processing technique that exploits a traditional sparse model's "guidance" to efficiently traverse the index, allowing the more effective learned model to execute fewer scoring operations. Our experiments show that our guided processing heuristic is able to boost the efficiency of the underlying learned sparse model by a factor of four without any measurable loss of effectiveness.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源