论文标题
孢子:大规模线性代数的关系相等性饱和的总和优化
SPORES: Sum-Product Optimization via Relational Equality Saturation for Large Scale Linear Algebra
论文作者
论文摘要
机器学习算法通常在线性代数(LA)中指定。 LA表达式可以通过利用输入属性(例如稀疏性)以及程序属性(例如常见的子表达和可易熔操作符)来重写更有效的形式。这些属性对执行成本的影响之间的复杂相互作用构成了优化编译器的挑战。现有的编译器诉诸于复杂的启发式方法,使代码库复杂化并增加了维护成本,但无法搜索相同的LA表达式,以找到最便宜的la表达式。我们通过将LA表达式转换为关系代数(RA)表达式,优化后者,然后将结果转换回(优化)LA表达式来引入LA表达式的一般优化技术。该方法的主要优点是它已经完成,这意味着可以使用RA中的等效规则找到任何等效的LA表达式。挑战是搜索空间的主要规模,我们通过采用和扩展一种称为“等价饱和的编译器”中的技术来解决此问题。我们将优化器集成到SystemML中,并在一系列机器学习任务中进行经验验证。我们表明,我们可以在SystemML中得出所有现有的手动编码优化,并执行新的优化,从而导致1.2倍至5倍加速。
Machine learning algorithms are commonly specified in linear algebra (LA). LA expressions can be rewritten into more efficient forms, by taking advantage of input properties such as sparsity, as well as program properties such as common subexpressions and fusible operators. The complex interaction among these properties' impact on the execution cost poses a challenge to optimizing compilers. Existing compilers resort to intricate heuristics that complicate the codebase and add maintenance cost but fail to search through the large space of equivalent LA expressions to find the cheapest one. We introduce a general optimization technique for LA expressions, by converting the LA expressions into Relational Algebra (RA) expressions, optimizing the latter, then converting the result back to (optimized) LA expressions. One major advantage of this method is that it is complete, meaning that any equivalent LA expression can be found using the equivalence rules in RA. The challenge is the major size of the search space, and we address this by adopting and extending a technique used in compilers, called equality saturation. We integrate the optimizer into SystemML and validate it empirically across a spectrum of machine learning tasks; we show that we can derive all existing hand-coded optimizations in SystemML, and perform new optimizations that lead to speedups from 1.2X to 5X.