OLLA：优化阵列的寿命和位置以减少神经网络的记忆使用量

论文标题

OLLA：优化阵列的寿命和位置以减少神经网络的记忆使用量

OLLA: Optimizing the Lifetime and Location of Arrays to Reduce the Memory Usage of Neural Networks

论文作者

Steiner, Benoit, Elhoushi, Mostafa, Kahn, Jacob, Hegarty, James

论文摘要

近年来，深度神经网络的大小呈指数增长。不幸的是，硬件设备尚未跟上快速增加的内存需求。为了应对这一问题，研究人员已转向溢出和重新造成的技术，这些技术会增加训练时间或降低精度和模型修剪，从而影响模型的准确性。我们提出了Olla，这是一种优化用于训练神经网络的张量的寿命和内存位置的算法。我们的方法减少了现有神经网络的内存使用情况，而无需对模型或其培训程序进行任何修改。我们将问题提出为联合整数线性程序（ILP）。我们提出了几种技术来简化问题的编码，并可以使用现成的ILP求解器来扩展到最先进的神经网络的大小。我们通过实验表明，OLLA仅需几分钟即可允许使用平均三分之一的记忆来训练神经网络。

The size of deep neural networks has grown exponentially in recent years. Unfortunately, hardware devices have not kept pace with the rapidly increasing memory requirements. To cope with this, researchers have turned to techniques such as spilling and recomputation, which increase training time, or reduced precision and model pruning, which can affect model accuracy. We present OLLA, an algorithm that optimizes the lifetime and memory location of the tensors used to train neural networks. Our method reduces the memory usage of existing neural networks, without needing any modification to the models or their training procedures. We formulate the problem as a joint integer linear program (ILP). We present several techniques to simplify the encoding of the problem, and enable our approach to scale to the size of state-of-the-art neural networks using an off-the-shelf ILP solver. We experimentally demonstrate that OLLA only takes minutes if not seconds to allow the training of neural networks using one-third less memory on average.

下载PDF全文

下载文献需遵守相关版权规定

论文标题