论文标题

在具有较晚实现的列店中实现窗口功能(扩展版)

Implementing Window Functions in a Column-Store with Late Materialization (Extended Version)

论文作者

Mukhaleva, Nadezhda, Grigorev, Valentin, Chernishev, George

论文摘要

窗口函数是聚合操作的概括。与汇总不同,其输出的基数始终与输入的基数相同。也就是说,该操作员的语义意味着每行的额外属性计算值,具体取决于其上下文,要么由滑动窗口或先前评估的行表示。窗口功能是一种非常强大的工具,在数据分析师中也很受欢迎,并且在大多数工业DBMSES的支持下。它允许优雅地表达相当复杂的用例,例如运行总和和平均值,本地最大值和最小值以及不同类型的排名。由于它们可以在没有自加入和相关子征服的情况下表达,因此可以更有效地进行评估。 在本文中,我们讨论了窗口功能在基于磁盘的柱子店内的实现,并以较晚的实现。晚期实现是一种旨在尽可能长时间从单个列中重建元组重建的技术。最初在00年代后期很受欢迎,如今很少被考虑。但是,在窗口功能的情况下,它可以大大降低内存足迹。本文的另一个贡献是将细分树应用于计算基于范围的窗口函数。

A window function is a generalization of the aggregation operation. Unlike aggregation, the cardinality of its output is always the same as the cardinality of input. That is, the semantics of this operator imply computing values for extra attributes for each row, depending on its context, either expressed by a sliding window or a previously evaluated row. Window functions are a very powerful tool, which is also popular among data analysts and supported by the majority of industrial DBMSes. It allows to gracefully express quite complex use-cases, such as running sums and averages, local maximum and minimum, and different types of ranking. Since they can be expressed without self-joins and correlated subqueries, their evaluation can be performed much more efficiently. In this paper we discuss an implementation of window functions inside a disk-based column-store with late materialization. Late materialization is a technique that aims to keep tuple reconstruction back from individual columns as long as possible. Initially popular in the late 00's, it is rarely considered nowadays. However, in case of window functions it allows to substantially lower memory footprint. Another contribution of this paper is the application of a segment tree to computing RANGE-based window functions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源