论文标题
皮层:利用相关性来提高查询性能
Cortex: Harnessing Correlations to Boost Query Performance
论文作者
论文摘要
数据库采用索引来滤除无关的记录,从而减少开销并加快查询执行加快。但是,此优化仅可用于查询在索引属性上过滤的查询。为了将这些加速度扩展到对其他属性的查询,数据库系统已转向二级和多维索引。不幸的是,这些方法是限制性的:辅助索引具有较大的内存足迹,只能加快访问少量记录的查询,而多维索引不能扩展到少数列。我们提出皮层,一种利用相关性将主要索引范围扩展到更多属性的方法。与先前的工作不同,Cortex可以适应任何现有的主要指数,无论是单一或多维的,都可以利用各种各样的相关性,例如在两个以上属性之间存在或具有大量异常值之间存在的相关性。我们证明,在展示这些不同类型的相关性,皮层匹配或优于传统次要索引的实际数据集中,$ 5 \ tims $ $ $ $ $ $ 2-8 \ times $ $比现有的索引相关方法的速度快。
Databases employ indexes to filter out irrelevant records, which reduces scan overhead and speeds up query execution. However, this optimization is only available to queries that filter on the indexed attribute. To extend these speedups to queries on other attributes, database systems have turned to secondary and multi-dimensional indexes. Unfortunately, these approaches are restrictive: secondary indexes have a large memory footprint and can only speed up queries that access a small number of records, and multi-dimensional indexes cannot scale to more than a handful of columns. We present Cortex, an approach that takes advantage of correlations to extend the reach of primary indexes to more attributes. Unlike prior work, Cortex can adapt itself to any existing primary index, whether single or multi-dimensional, to harness a broad variety of correlations, such as those that exist between more than two attributes or have a large number of outliers. We demonstrate that on real datasets exhibiting these diverse types of correlations, Cortex matches or outperforms traditional secondary indexes with $5\times$ less space, and it is $2-8\times$ faster than existing approaches to indexing correlations.