论文标题

清晰感的动力最小化的动力:跨沟渠弹跳并向宽阔的最小值漂移

The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima

论文作者

Bartlett, Peter L., Long, Philip M., Bousquet, Olivier

论文摘要

我们考虑清晰度感知最小化(SAM),这是一种基于梯度的优化方法,用于深层网络,在图像和语言预测问题上表现出了性能的改进。我们表明,当SAM用凸二次物镜应用SAM时,对于大多数随机初始化,它会收敛到一个循环,该循环在最小值的两侧沿最小曲率的最小值之间振荡,并且我们提供了收敛速率的界限。 在非二次情况下,我们表明,这种振荡有效地执行了梯度下降,并在Hessian的光谱规范上具有较小的步进尺寸。在这种情况下,SAM的更新可能被视为第三个衍生物 - 在领先的特征向量方向上的Hessian的衍生物 - 鼓励向更广泛的最小值移动。

We consider Sharpness-Aware Minimization (SAM), a gradient-based optimization method for deep networks that has exhibited performance improvements on image and language prediction problems. We show that when SAM is applied with a convex quadratic objective, for most random initializations it converges to a cycle that oscillates between either side of the minimum in the direction with the largest curvature, and we provide bounds on the rate of convergence. In the non-quadratic case, we show that such oscillations effectively perform gradient descent, with a smaller step-size, on the spectral norm of the Hessian. In such cases, SAM's update may be regarded as a third derivative -- the derivative of the Hessian in the leading eigenvector direction -- that encourages drift toward wider minima.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源