关于梯度下降的正规化，层不平衡和平坦的最小值

论文标题

关于梯度下降的正规化，层不平衡和平坦的最小值

On regularization of gradient descent, layer imbalance and flat minima

论文作者

Ginsburg, Boris

论文摘要

我们使用新的度量 - 层不平衡分析了深线性网络的训练动力学，从而定义了解决方案的平坦度。我们证明，不同的正则化方法（例如重量衰减或噪声数据增强）以类似的方式行事。训练有两个不同的阶段：1）优化和2）正则化。首先，在优化阶段，损耗函数单调降低，轨迹朝向最小歧管。然后，在正规化阶段，层不平衡降低，轨迹沿着最小歧管朝着平坦的区域延伸。最后，我们扩展了随机梯度下降的分析，并表明SGD与噪声正则化类似。

We analyze the training dynamics for deep linear networks using a new metric - layer imbalance - which defines the flatness of a solution. We demonstrate that different regularization methods, such as weight decay or noise data augmentation, behave in a similar way. Training has two distinct phases: 1) optimization and 2) regularization. First, during the optimization phase, the loss function monotonically decreases, and the trajectory goes toward a minima manifold. Then, during the regularization phase, the layer imbalance decreases, and the trajectory goes along the minima manifold toward a flat area. Finally, we extend the analysis for stochastic gradient descent and show that SGD works similarly to noise regularization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题