论文标题

关于梯度下降的正规化,层不平衡和平坦的最小值

On regularization of gradient descent, layer imbalance and flat minima

论文作者

Ginsburg, Boris

论文摘要

我们使用新的度量 - 层不平衡分析了深线性网络的训练动力学,从而定义了解决方案的平坦度。我们证明,不同的正则化方法(例如重量衰减或噪声数据增强)以类似的方式行事。训练有两个不同的阶段:1)优化和2)正则化。首先,在优化阶段,损耗函数单调降低,轨迹朝向最小歧管。然后,在正规化阶段,层不平衡降低,轨迹沿着最小歧管朝着平坦的区域延伸。最后,我们扩展了随机梯度下降的分析,并表明SGD与噪声正则化类似。

We analyze the training dynamics for deep linear networks using a new metric - layer imbalance - which defines the flatness of a solution. We demonstrate that different regularization methods, such as weight decay or noise data augmentation, behave in a similar way. Training has two distinct phases: 1) optimization and 2) regularization. First, during the optimization phase, the loss function monotonically decreases, and the trajectory goes toward a minima manifold. Then, during the regularization phase, the layer imbalance decreases, and the trajectory goes along the minima manifold toward a flat area. Finally, we extend the analysis for stochastic gradient descent and show that SGD works similarly to noise regularization.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源