论文标题

缩放和负载平衡等电量

Scaling and Load-Balancing Equi-Joins

论文作者

Metwally, Ahmed

论文摘要

加入两个表的任务是查询数据库的基础。在本文中,我们关注的是等级加入问题,其中两个加入表中的一对记录是联接结果的一部分,如果相等性在联接列中的值之间保持相等性。尽管这是一个可以解决的问题时,当连接表中的记录数相对较小时,随着桌子尺寸的增加,它变得非常具有挑战性,尤其是在两个连接表中都有热键(加入列值的连接列值(与大量记录的连接值)时。 本文是[Metwally-Sigmod-2022]的扩展版本,提出了适应性 - 培养基 - 加入(AM-JOIN),用于在分布式共享的架构中进行可扩展和快速的Equi-Joins。 Am-Join利用(A)Tree-Join,这是一种提出的小说算法,当连接桌子共享热键时,它可以很好地缩放,并且(b)广播 - 加入Join,这是加入仅在一张桌子中热的键时已知的最快的钥匙。 与最先进的算法不同,AM-JOIN(a)通过在整个联接执行过程中实现负载平衡来整体解决联接问题问题,并且(b)支持所有外部加入变体,而无需记录重复程序或自定义表分区。对于最快的AM-JOIN外部加入性能,我们提出了用于小型大型连接的算法算法的索引广播 - 加入(IB-JOIN)家族,其中一张桌子适合内存,另一个桌子可以达到数量级。 IB-Join的外部加入变体改善了最先进的小型外部加入算法。 所提出的算法可以在任何共享的架构中采用。我们使用Spark实施了MapReduce版本。我们的评估表明,与最先进的算法相比,提出的算法执行速度明显更快,并扩展到更偏斜的偏斜和尺寸更大的表。

The task of joining two tables is fundamental for querying databases. In this paper, we focus on the equi-join problem, where a pair of records from the two joined tables are part of the join results if equality holds between their values in the join column(s). While this is a tractable problem when the number of records in the joined tables is relatively small, it becomes very challenging as the table sizes increase, especially if hot keys (join column values with a large number of records) exist in both joined tables. This paper, an extended version of [metwally-SIGMOD-2022], proposes Adaptive-Multistage-Join (AM-Join) for scalable and fast equi-joins in distributed shared-nothing architectures. AM-Join utilizes (a) Tree-Join, a proposed novel algorithm that scales well when the joined tables share hot keys, and (b) Broadcast-Join, the known fastest when joining keys that are hot in only one table. Unlike the state-of-the-art algorithms, AM-Join (a) holistically solves the join-skew problem by achieving load balancing throughout the join execution, and (b) supports all outer-join variants without record deduplication or custom table partitioning. For the fastest AM-Join outer-join performance, we propose the Index-Broadcast-Join (IB-Join) family of algorithms for Small-Large joins, where one table fits in memory and the other can be up to orders of magnitude larger. The outer-join variants of IB-Join improves on the state-of-the-art Small-Large outer-join algorithms. The proposed algorithms can be adopted in any shared-nothing architecture. We implemented a MapReduce version using Spark. Our evaluation shows the proposed algorithms execute significantly faster and scale to more skewed and orders-of-magnitude bigger tables when compared to the state-of-the-art algorithms.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源