神经训练中的基本相关动力

论文标题

神经训练中的基本相关动力

The Underlying Correlated Dynamics in Neural Training

论文作者

Turjeman, Rotem, Berkov, Tom, Cohen, Ido, Gilboa, Guy

论文摘要

神经网络的培训是一项计算密集的任务。随着越来越大的网络的训练，理解和建模训练动态的重要性正在增长。我们在这项工作中提出了一个基于参数动力学相关性的模型，该模型大大降低了维度。我们将算法称为\ emph {相关模式分解}（CMD）。它将参数空间分为参数（模式）组，这些参数以高度相关的方式通过时期进行了行为。我们通过这种方法实现了显着的维度降低，在这种方法中，Resnet-18，Transformers和Gans（包含数百万个参数）等网络只能使用几种模式对其进行良好的建模。我们观察到模式的每个典型时间概况均分布在整个网络中的所有层。此外，我们的模型会引起正则化，从而在测试集上产生更好的概括能力。这种表示增强了对基础训练动态的理解，并可以为设计更好的加速技术铺平道路。

Training of neural networks is a computationally intensive task. The significance of understanding and modeling the training dynamics is growing as increasingly larger networks are being trained. We propose in this work a model based on the correlation of the parameters' dynamics, which dramatically reduces the dimensionality. We refer to our algorithm as \emph{correlation mode decomposition} (CMD). It splits the parameter space into groups of parameters (modes) which behave in a highly correlated manner through the epochs. We achieve a remarkable dimensionality reduction with this approach, where networks like ResNet-18, transformers and GANs, containing millions of parameters, can be modeled well using just a few modes. We observe each typical time profile of a mode is spread throughout the network in all layers. Moreover, our model induces regularization which yields better generalization capacity on the test set. This representation enhances the understanding of the underlying training dynamics and can pave the way for designing better acceleration techniques.

下载PDF全文

下载文献需遵守相关版权规定

论文标题