复合非凸优化的循环块坐标下降，方差降低

论文标题

复合非凸优化的循环块坐标下降，方差降低

Cyclic Block Coordinate Descent With Variance Reduction for Composite Nonconvex Optimization

论文作者

Cai, Xufeng, Song, Chaobing, Wright, Stephen J., Diakonikolas, Jelena

论文摘要

非convex优化是解决许多机器学习问题的核心，其中通常遇到块结构。在这项工作中，我们提出了针对非convex优化问题的循环块坐标方法，该方法具有非反应梯度规范保证。我们的收敛分析基于梯度Lipschitz条件相对于Mahalanobis Norm，其灵感来自于循环块坐标方法的最新进展。在确定性的设置中，我们的收敛保证与（全梯度）梯度下降的保证相匹配，但梯度Lipschitz常数被定义为W.R.T.〜一个Mahalanobis Norm。在随机设置中，我们使用递归差异降低来降低触及率成本，并匹配当前最佳随机随机全梯度方法的算术运算复杂性，并对有限的 - sum和Infinite-sum案例进行了统一的分析。当polyak-lojasiewicz（Pł）条件持有时，我们证明了更快的线性收敛。据我们所知，这项工作是第一个为一般复合（平滑 +非齿）非convex设置中的环状块坐标方法提供非质子收敛保证的保证 - 是否降低了方差。我们的实验结果表明，拟议的循环方案在训练深神经网中的功效。

Nonconvex optimization is central in solving many machine learning problems, in which block-wise structure is commonly encountered. In this work, we propose cyclic block coordinate methods for nonconvex optimization problems with non-asymptotic gradient norm guarantees. Our convergence analysis is based on a gradient Lipschitz condition with respect to a Mahalanobis norm, inspired by a recent progress on cyclic block coordinate methods. In deterministic settings, our convergence guarantee matches the guarantee of (full-gradient) gradient descent, but with the gradient Lipschitz constant being defined w.r.t.~a Mahalanobis norm. In stochastic settings, we use recursive variance reduction to decrease the per-iteration cost and match the arithmetic operation complexity of current optimal stochastic full-gradient methods, with a unified analysis for both finite-sum and infinite-sum cases. We prove a faster linear convergence result when a Polyak-Łojasiewicz (PŁ) condition holds. To our knowledge, this work is the first to provide non-asymptotic convergence guarantees -- variance-reduced or not -- for a cyclic block coordinate method in general composite (smooth + nonsmooth) nonconvex settings. Our experimental results demonstrate the efficacy of the proposed cyclic scheme in training deep neural nets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题