论文标题

复合非凸优化的循环块坐标下降,方差降低

Cyclic Block Coordinate Descent With Variance Reduction for Composite Nonconvex Optimization

论文作者

Cai, Xufeng, Song, Chaobing, Wright, Stephen J., Diakonikolas, Jelena

论文摘要

非convex优化是解决许多机器学习问题的核心,其中通常遇到块结构。在这项工作中,我们提出了针对非convex优化问题的循环块坐标方法,该方法具有非反应梯度规范保证。我们的收敛分析基于梯度Lipschitz条件相对于Mahalanobis Norm,其灵感来自于循环块坐标方法的最新进展。在确定性的设置中,我们的收敛保证与(全梯度)梯度下降的保证相匹配,但梯度Lipschitz常数被定义为W.R.T.〜一个Mahalanobis Norm。在随机设置中,我们使用递归差异降低来降低触及率成本,并匹配当前最佳随机随机全梯度方法的算术运算复杂性,并对有限的 - sum和Infinite-sum案例进行了统一的分析。当polyak-lojasiewicz(Pł)条件持有时,我们证明了更快的线性收敛。据我们所知,这项工作是第一个为一般复合(平滑 +非齿)非convex设置中的环状块坐标方法提供非质子收敛保证的保证 - 是否降低了方差。我们的实验结果表明,拟议的循环方案在训练深神经网中的功效。

Nonconvex optimization is central in solving many machine learning problems, in which block-wise structure is commonly encountered. In this work, we propose cyclic block coordinate methods for nonconvex optimization problems with non-asymptotic gradient norm guarantees. Our convergence analysis is based on a gradient Lipschitz condition with respect to a Mahalanobis norm, inspired by a recent progress on cyclic block coordinate methods. In deterministic settings, our convergence guarantee matches the guarantee of (full-gradient) gradient descent, but with the gradient Lipschitz constant being defined w.r.t.~a Mahalanobis norm. In stochastic settings, we use recursive variance reduction to decrease the per-iteration cost and match the arithmetic operation complexity of current optimal stochastic full-gradient methods, with a unified analysis for both finite-sum and infinite-sum cases. We prove a faster linear convergence result when a Polyak-Łojasiewicz (PŁ) condition holds. To our knowledge, this work is the first to provide non-asymptotic convergence guarantees -- variance-reduced or not -- for a cyclic block coordinate method in general composite (smooth + nonsmooth) nonconvex settings. Our experimental results demonstrate the efficacy of the proposed cyclic scheme in training deep neural nets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源