论文标题

分析沿GD轨迹的清晰度:逐步锐化和稳定边缘

Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability

论文作者

Li, Zhouzi, Wang, Zixuan, Li, Jian

论文摘要

最近的发现(例如ARXIV:2103.00065)表明,通过全批梯度下降训练的现代神经网络通常进入一个称为稳定边缘(EOS)的政权。在此制度中,清晰度(即最大的Hessian特征值)首先增加到值2/(步长尺寸)(逐步锐化阶段),然后围绕该值(EOS相)振荡。本文旨在分析沿优化轨迹的GD动力学和清晰度。我们的分析自然将GD轨迹分为四个阶段,具体取决于清晰度的变化。从经验上,我们将输出层重量的标准识别为清晰度动力学的有趣指标。基于这种经验观察,我们尝试从理论和经验上解释各种关键量的动力学,从而导致EOS每个阶段的清晰度变化。此外,基于某些假设,我们提供了两层完全连接的线性神经网络中EOS制度的清晰度行为的理论证明。我们还讨论了其他一些经验发现以及我们的理论结果的局限性。

Recent findings (e.g., arXiv:2103.00065) demonstrate that modern neural networks trained by full-batch gradient descent typically enter a regime called Edge of Stability (EOS). In this regime, the sharpness, i.e., the maximum Hessian eigenvalue, first increases to the value 2/(step size) (the progressive sharpening phase) and then oscillates around this value (the EOS phase). This paper aims to analyze the GD dynamics and the sharpness along the optimization trajectory. Our analysis naturally divides the GD trajectory into four phases depending on the change of the sharpness. We empirically identify the norm of output layer weight as an interesting indicator of sharpness dynamics. Based on this empirical observation, we attempt to theoretically and empirically explain the dynamics of various key quantities that lead to the change of sharpness in each phase of EOS. Moreover, based on certain assumptions, we provide a theoretical proof of the sharpness behavior in EOS regime in two-layer fully-connected linear neural networks. We also discuss some other empirical findings and the limitation of our theoretical results.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源