区分张量语言

论文标题

区分张量语言

Differentiating a Tensor Language

论文作者

Bernstein, Gilbert, Mara, Michael, Li, Tzu-Mao, Maclaurin, Dougal, Ragan-Kelley, Jonathan

论文摘要

张量程序程序的一个编译衍生物如何使所得代码纯粹是功能性的（因此更易于优化和并行化），并且相对于原始程序可获得有效的效率？我们表明，在张量和pytorch等流行系统中进行的张量代码天真差异化 - 在病理情况下会导致渐近的放缓，从而违反了廉价的梯度原理。但是，所有现有的自动分化方法可以保证此原理（对于可变大小数据）来依靠 +=突变通过别名/指针 - - 这使下游优化复杂化。我们通过明确考虑稀疏性来提供第一个纯粹功能性的，可证明的，可证明的，呈效率的，伴随/反向模式的衍生物。我们通过关注Iverson APL的指示功能来做到这一点。我们还基于内部产品的通用特性引入了一种新的“张量SSA”正常形式和反向模式自动分化的新推导。

How does one compile derivatives of tensor programs, such that the resulting code is purely functional (hence easier to optimize and parallelize) and provably efficient relative to the original program? We show that naively differentiating tensor code---as done in popular systems like Tensorflow and PyTorch---can cause asymptotic slowdowns in pathological cases, violating the Cheap Gradients Principle. However, all existing automatic differentiation methods that guarantee this principle (for variable size data) do so by relying on += mutation through aliases/pointers---which complicates downstream optimization. We provide the first purely functional, provably efficient, adjoint/reverse-mode derivatives of array/tensor code by explicitly accounting for sparsity. We do this by focusing on the indicator function from Iverson's APL. We also introduce a new "Tensor SSA" normal form and a new derivation of reverse-mode automatic differentiation based on the universal property of inner-products.

下载PDF全文

下载文献需遵守相关版权规定

论文标题