论文标题
清晰感的动力最小化的动力:跨沟渠弹跳并向宽阔的最小值漂移
The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima
论文作者
论文摘要
我们考虑清晰度感知最小化(SAM),这是一种基于梯度的优化方法,用于深层网络,在图像和语言预测问题上表现出了性能的改进。我们表明,当SAM用凸二次物镜应用SAM时,对于大多数随机初始化,它会收敛到一个循环,该循环在最小值的两侧沿最小曲率的最小值之间振荡,并且我们提供了收敛速率的界限。 在非二次情况下,我们表明,这种振荡有效地执行了梯度下降,并在Hessian的光谱规范上具有较小的步进尺寸。在这种情况下,SAM的更新可能被视为第三个衍生物 - 在领先的特征向量方向上的Hessian的衍生物 - 鼓励向更广泛的最小值移动。
We consider Sharpness-Aware Minimization (SAM), a gradient-based optimization method for deep networks that has exhibited performance improvements on image and language prediction problems. We show that when SAM is applied with a convex quadratic objective, for most random initializations it converges to a cycle that oscillates between either side of the minimum in the direction with the largest curvature, and we provide bounds on the rate of convergence. In the non-quadratic case, we show that such oscillations effectively perform gradient descent, with a smaller step-size, on the spectral norm of the Hessian. In such cases, SAM's update may be regarded as a third derivative -- the derivative of the Hessian in the leading eigenvector direction -- that encourages drift toward wider minima.