对非凸状态中恒定步长SGD的分析：渐近性和偏差

论文标题

对非凸状态中恒定步长SGD的分析：渐近性和偏差

An Analysis of Constant Step Size SGD in the Non-convex Regime: Asymptotic Normality and Bias

论文作者

Yu, Lu, Balasubramanian, Krishnakumar, Volgushev, Stanislav, Erdogdu, Murat A.

论文摘要

在统计机器学习中经常出现结构化的非凸学习问题，其中关键点具有有利的统计特性。对于此类问题，算法收敛和统计估计率是充分理解的。但是，在非凸面设置中量化与基础训练算法相关的不确定性并未得到充分研究。为了解决这项工作中的这一缺点，我们为恒定的步长随机梯度下降（SGD）算法建立了一个渐近正态性结果，这是一种在实践中广泛使用的算法。具体而言，基于SGD和Markov链之间的关系[DDB19]，我们表明SGD迭代的平均值在其独特的不变分布的预期值上是渐近地分布在其独特的不变分布的期望值上，只要非convex和非平滑目标功能就可以满足消散性属性。我们还表征了该期望值与目标函数的临界点之间的偏差。总之，可以利用以上两个结果来构建使用SGD算法训练的非凸问题的置信区间。

Structured non-convex learning problems, for which critical points have favorable statistical properties, arise frequently in statistical machine learning. Algorithmic convergence and statistical estimation rates are well-understood for such problems. However, quantifying the uncertainty associated with the underlying training algorithm is not well-studied in the non-convex setting. In order to address this shortcoming, in this work, we establish an asymptotic normality result for the constant step size stochastic gradient descent (SGD) algorithm--a widely used algorithm in practice. Specifically, based on the relationship between SGD and Markov Chains [DDB19], we show that the average of SGD iterates is asymptotically normally distributed around the expected value of their unique invariant distribution, as long as the non-convex and non-smooth objective function satisfies a dissipativity property. We also characterize the bias between this expected value and the critical points of the objective function under various local regularity conditions. Together, the above two results could be leveraged to construct confidence intervals for non-convex problems that are trained using the SGD algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题