清晰感的动力最小化的动力：跨沟渠弹跳并向宽阔的最小值漂移

论文标题

清晰感的动力最小化的动力：跨沟渠弹跳并向宽阔的最小值漂移

The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima

论文作者

Bartlett, Peter L., Long, Philip M., Bousquet, Olivier

论文摘要

我们考虑清晰度感知最小化（SAM），这是一种基于梯度的优化方法，用于深层网络，在图像和语言预测问题上表现出了性能的改进。我们表明，当SAM用凸二次物镜应用SAM时，对于大多数随机初始化，它会收敛到一个循环，该循环在最小值的两侧沿最小曲率的最小值之间振荡，并且我们提供了收敛速率的界限。在非二次情况下，我们表明，这种振荡有效地执行了梯度下降，并在Hessian的光谱规范上具有较小的步进尺寸。在这种情况下，SAM的更新可能被视为第三个衍生物 - 在领先的特征向量方向上的Hessian的衍生物 - 鼓励向更广泛的最小值移动。

We consider Sharpness-Aware Minimization (SAM), a gradient-based optimization method for deep networks that has exhibited performance improvements on image and language prediction problems. We show that when SAM is applied with a convex quadratic objective, for most random initializations it converges to a cycle that oscillates between either side of the minimum in the direction with the largest curvature, and we provide bounds on the rate of convergence. In the non-quadratic case, we show that such oscillations effectively perform gradient descent, with a smaller step-size, on the spectral norm of the Hessian. In such cases, SAM's update may be regarded as a third derivative -- the derivative of the Hessian in the leading eigenvector direction -- that encourages drift toward wider minima.

下载PDF全文

下载文献需遵守相关版权规定

论文标题