论文标题
部门:通过双重激活精度进行记忆效率训练
DIVISION: Memory Efficient Training via Dual Activation Precision
论文作者
论文摘要
激活压缩训练为降低训练深神经网络的记忆成本〜(DNNS)提供了解决方案。但是,最先进的工作结合了对量化位的搜索与培训的搜索,这使得该过程变得复杂且透明。为此,我们提出了一种简单有效的方法来压缩DNN培训。我们的方法是由有指导性观察的动机:DNN向后传播主要利用激活图的低频组件(LFC),而大部分记忆是用于缓存训练期间高频组件(HFC)。这表明激活图的HFC在DNN训练过程中是高度冗余和可压缩的,这激发了我们提出的双重激活精度(分裂)。在培训期间,部门保留了LFC的高精度副本,并将HFC压缩成具有低数值精度的轻质副本。这可以大大降低记忆成本,而不会负面影响向后传播的精确度,以使划分保持竞争模型的准确性。实验结果表明,与最新方法相比,分裂具有更好的全面性能,包括激活图的10倍压缩和竞争性训练吞吐量,而不会损失模型精度。
Activation compressed training provides a solution towards reducing the memory cost of training deep neural networks~(DNNs). However, state-of-the-art work combines a search of quantization bit-width with the training, which makes the procedure complicated and less transparent. To this end, we propose a simple and effective method to compress DNN training. Our method is motivated by an instructive observation: DNN backward propagation mainly utilizes the low-frequency component (LFC) of the activation maps, while the majority of memory is for caching the high-frequency component (HFC) during the training. This indicates the HFC of activation maps is highly redundant and compressible during DNN training, which inspires our proposed Dual Activation Precision (DIVISION). During the training, DIVISION preserves the high-precision copy of LFC and compresses the HFC into a light-weight copy with low numerical precision. This can significantly reduce the memory cost without negatively affecting the precision of backward propagation such that DIVISION maintains competitive model accuracy. Experiment results show DIVISION has better comprehensive performance than state-of-the-art methods, including over 10x compression of activation maps and competitive training throughput, without loss of model accuracy.