论文标题

用自适应polyak步骤切割一些SGD的松弛

Cutting Some Slack for SGD with Adaptive Polyak Stepsizes

论文作者

Gower, Robert M., Blondel, Mathieu, Gazagnadou, Nidham, Pedregosa, Fabian

论文摘要

调整随机梯度下降的步骤大小是繁琐的,并且容易发生误差。这激发了使用随时可用的信息自动调整步长的方法的开发。在本文中,我们考虑了SPS的家族(具有polyak spectize的随机梯度)自适应方法。这些方法可以利用在采样点处使用梯度和损失值以自适应调节步长。我们首先表明SPS及其最近的变体都可以看作是应用于非线性问题的被动攻击方法的扩展。我们使用这种见解来开发SPS方法的新变体,这些变体更适合非线性模型。我们的新变体是基于将松弛变量引入插值方程的基础。这个单个松弛变量跟踪跨迭代的损耗函数,用于设置稳定的步长。我们提供了支持我们的新方法和收敛理论的广泛数值结果。

Tuning the step size of stochastic gradient descent is tedious and error prone. This has motivated the development of methods that automatically adapt the step size using readily available information. In this paper, we consider the family of SPS (Stochastic gradient with a Polyak Stepsize) adaptive methods. These are methods that make use of gradient and loss value at the sampled points to adaptively adjust the step size. We first show that SPS and its recent variants can all be seen as extensions of the Passive-Aggressive methods applied to nonlinear problems. We use this insight to develop new variants of the SPS method that are better suited to nonlinear models. Our new variants are based on introducing a slack variable into the interpolation equations. This single slack variable tracks the loss function across iterations and is used in setting a stable step size. We provide extensive numerical results supporting our new methods and a convergence theory.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源