具有几乎线性梯度甲骨文复杂性的分解非平滑凸优化

论文标题

具有几乎线性梯度甲骨文复杂性的分解非平滑凸优化

Decomposable Non-Smooth Convex Optimization with Nearly-Linear Gradient Oracle Complexity

论文作者

Dong, Sally, Jiang, Haotian, Lee, Yin Tat, Padmanabhan, Swati, Ye, Guanghao

论文摘要

机器学习中的许多基本问题可以通过convex程序\ [\ min_ {θ\ in r^d} \ \ sum_ {i = 1}^{n} f_ {i}（i}（θ），\]，每个$ f_i $均为convex，lipschitz，lipschitz，lipschitz函数$ d_ $ d_i $ d_i $ d_i $ d_i $ d_i $ d_i。以随机梯度下降为例，解决此问题的一种常见方法涉及在每次迭代时对一个$ f_i $术语进行取得进展。这种方法至关重要地依赖于$ f_i $的均匀性概念，并正式捕获了其状况编号。在这项工作中，我们给出了一种将上述凸公式最小化为$ \ widetilde {o}（\ sum_ {i = 1}^n d_i \ log（1 /ε））$梯度计算，对条件编号没有假设。与条件编号无关的先前最佳算法是标准切割平面方法，它需要$ o（nd \ log（1/ε））$梯度计算。作为推论，我们改善了Axiotis等人的评估甲骨文的复杂性，用于分解性下的最小化。（ICML 2021）。我们的主要技术贡献是一种自适应程序，可以通过切割平面和内点方法的新型组合在每次迭代中选择$ f_i $术语。

Many fundamental problems in machine learning can be formulated by the convex program \[ \min_{θ\in R^d}\ \sum_{i=1}^{n}f_{i}(θ), \] where each $f_i$ is a convex, Lipschitz function supported on a subset of $d_i$ coordinates of $θ$. One common approach to this problem, exemplified by stochastic gradient descent, involves sampling one $f_i$ term at every iteration to make progress. This approach crucially relies on a notion of uniformity across the $f_i$'s, formally captured by their condition number. In this work, we give an algorithm that minimizes the above convex formulation to $ε$-accuracy in $\widetilde{O}(\sum_{i=1}^n d_i \log (1 /ε))$ gradient computations, with no assumptions on the condition number. The previous best algorithm independent of the condition number is the standard cutting plane method, which requires $O(nd \log (1/ε))$ gradient computations. As a corollary, we improve upon the evaluation oracle complexity for decomposable submodular minimization by Axiotis et al. (ICML 2021). Our main technical contribution is an adaptive procedure to select an $f_i$ term at every iteration via a novel combination of cutting-plane and interior-point methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题