论文标题

停止在Bottou-Curtis-NoceDal功能上的随机梯度下降的标准和强大的收敛性

Stopping Criteria for, and Strong Convergence of, Stochastic Gradient Descent on Bottou-Curtis-Nocedal Functions

论文作者

Patel, Vivak

论文摘要

停止随机梯度下降(SGD)方法的标准从启用适应性步长方案到为下游分析(例如渐近推断)提供严格的作用。不幸的是,当前针对SGD方法的停止标准通常是依赖渐近正态性结果或与固定分布的收敛性的启发式方法,而固定分布可能不存在于非凸功能,从而限制了此类停止标准的适用性。为了解决这个问题,在这项工作中,我们严格地制定了两个SGD的停止标准,可以应用于一类广泛的非convex函数,我们将其称为Bottou-Curtis-NoceDal功能。此外,作为制定这些停止标准的先决条件,我们证明,在SGD的迭代中评估的梯度函数对Bottou-Curtis-Nocedal功能强烈收敛至零,该功能解决了SGD文献中的一个空缺问题。由于我们的工作,我们严格开发的停止标准可用于开发新的自适应步长方案或支持非convex功能的其他下游分析。

Stopping criteria for Stochastic Gradient Descent (SGD) methods play important roles from enabling adaptive step size schemes to providing rigor for downstream analyses such as asymptotic inference. Unfortunately, current stopping criteria for SGD methods are often heuristics that rely on asymptotic normality results or convergence to stationary distributions, which may fail to exist for nonconvex functions and, thereby, limit the applicability of such stopping criteria. To address this issue, in this work, we rigorously develop two stopping criteria for SGD that can be applied to a broad class of nonconvex functions, which we term Bottou-Curtis-Nocedal functions. Moreover, as a prerequisite for developing these stopping criteria, we prove that the gradient function evaluated at SGD's iterates converges strongly to zero for Bottou-Curtis-Nocedal functions, which addresses an open question in the SGD literature. As a result of our work, our rigorously developed stopping criteria can be used to develop new adaptive step size schemes or bolster other downstream analyses for nonconvex functions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源