论文标题

具有几乎线性梯度甲骨文复杂性的分解非平滑凸优化

Decomposable Non-Smooth Convex Optimization with Nearly-Linear Gradient Oracle Complexity

论文作者

Dong, Sally, Jiang, Haotian, Lee, Yin Tat, Padmanabhan, Swati, Ye, Guanghao

论文摘要

机器学习中的许多基本问题可以通过convex程序\ [\ min_ {θ\ in r^d} \ \ sum_ {i = 1}^{n} f_ {i}(i}(θ),\],每个$ f_i $均为convex,lipschitz,lipschitz,lipschitz函数$ d_ $ d_i $ d_i $ d_i $ d_i $ d_i $ d_i。以随机梯度下降为例,解决此问题的一种常见方法涉及在每次迭代时对一个$ f_i $术语进行取得进展。这种方法至关重要地依赖于$ f_i $的均匀性概念,并正式捕获了其状况编号。在这项工作中,我们给出了一种将上述凸公式最小化为$ \ widetilde {o}(\ sum_ {i = 1}^n d_i \ log(1 /ε))$梯度计算,对条件编号没有假设。与条件编号无关的先前最佳算法是标准切割平面方法,它需要$ o(nd \ log(1/ε))$梯度计算。作为推论,我们改善了Axiotis等人的评估甲骨文的复杂性,用于分解性下的最小化。 (ICML 2021)。我们的主要技术贡献是一种自适应程序,可以通过切割平面和内点方法的新型组合在每次迭代中选择$ f_i $术语。

Many fundamental problems in machine learning can be formulated by the convex program \[ \min_{θ\in R^d}\ \sum_{i=1}^{n}f_{i}(θ), \] where each $f_i$ is a convex, Lipschitz function supported on a subset of $d_i$ coordinates of $θ$. One common approach to this problem, exemplified by stochastic gradient descent, involves sampling one $f_i$ term at every iteration to make progress. This approach crucially relies on a notion of uniformity across the $f_i$'s, formally captured by their condition number. In this work, we give an algorithm that minimizes the above convex formulation to $ε$-accuracy in $\widetilde{O}(\sum_{i=1}^n d_i \log (1 /ε))$ gradient computations, with no assumptions on the condition number. The previous best algorithm independent of the condition number is the standard cutting plane method, which requires $O(nd \log (1/ε))$ gradient computations. As a corollary, we improve upon the evaluation oracle complexity for decomposable submodular minimization by Axiotis et al. (ICML 2021). Our main technical contribution is an adaptive procedure to select an $f_i$ term at every iteration via a novel combination of cutting-plane and interior-point methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源