论文标题

OSM-TREE:分类索引

OSM-tree: A Sortedness-Aware Index

论文作者

Raman, Aneesh, Sarkar, Subhadeep, Olma, Matthaios, Athanassoulis, Manos

论文摘要

当选择谓词在索引键上时,索引促进有效的查询。结果,当加载数据时,如果我们预计将来的选择性(点或范围)查询,我们通常维护一个索引,随着新数据的摄入,该索引逐渐被填充。在这方面,可以将索引视为将结构添加到传入的数据收集的过程中。添加结构的过程是有代价的,因为将每个新条目都插入索引中,而不是简单地附加到传入的数据。如果数据摄入顺序与索引属性顺序匹配,则摄入成本是完全冗余的,并且可以避免(例如,通过B+-tree中的散装加载)。但是,当数据以接近分类但未完全分类的顺序摄入数据时,最新的索引设计不会受益。在本文中,我们研究了索引如何从部分数据分类或近等级中受益,并且我们提出了一系列技术集合,将批量负载,索引附加,可变节点填充/拆分因子和缓冲因素结合起来,以优化偏分态性分类性的树指数的摄入成本。我们通过必要的元数据结构进一步增强了拟议的设计,以确保竞争性阅读性能。我们在最先进的B+树上应用所提出的设计范式,并提出了有序的排序与树(OSM-Tree)。 OSM-Tree在存在分类的情况下以高达8.8倍的摄入性能优于艺术状态,同时又回到了B+树的摄入性能时,当数据被炒作时。 OSM-TREE提供了竞争性的查询性能,可为混合阅读工作负载带来28%至5倍的性能优势。

Indexes facilitate efficient querying when the selection predicate is on an indexed key. As a result, when loading data, if we anticipate future selective (point or range) queries, we typically maintain an index that is gradually populated as new data is ingested. In that respect, indexing can be perceived as the process of adding structure to an incoming, otherwise unsorted, data collection. The process of adding structure comes at a cost, as instead of simply appending incoming data, every new entry is inserted into the index. If the data ingestion order matches the indexed attribute order, the ingestion cost is entirely redundant and can be avoided (e.g., via bulk loading in a B+-tree). However, state-of-the-art index designs do not benefit when data is ingested in an order that is close to being sorted but not fully sorted. In this paper, we study how indexes can benefit from partial data sortedness or near-sortedness, and we propose an ensemble of techniques that combine bulk loading, index appends, variable node fill/split factor, and buffering, to optimize the ingestion cost of a tree index in presence of partial data sortedness. We further augment the proposed design with necessary metadata structures to ensure competitive read performance. We apply the proposed design paradigm on a state-of-the-art B+-tree, and we propose the Ordered Sort-Merge tree (OSM-tree). OSM-tree outperforms the state of the art by up to 8.8x in ingestion performance in the presence of sortedness, while falling back to a B+-tree's ingestion performance when data is scrambled. OSM-tree offers competitive query performance, leading to performance benefits between 28% and 5x for mixed read/write workloads.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源