关于最小化异质总和的最佳通用一阶方法

论文标题

关于最小化异质总和的最佳通用一阶方法

On Optimal Universal First-Order Methods for Minimizing Heterogeneous Sums

论文作者

Grimmer, Benjamin

论文摘要

这项工作认为将凸功能的总和最小化，每个函数的总和从非平滑到光滑，Lipschitz到非lipschitz的潜在结构不同。 Nesterov的通用快速梯度方法提供了一种最佳的黑盒一阶方法，用于最大程度地减少单个函数，该函数利用存在的任何连续性结构而无需先验知识。在本文中，我们表明这种具有里程碑意义的方法（没有修改）进一步适应了异构总和。例如，它以$ o（m^2/ε^2 + \ sqrt {l/ε}）$的速率最小化非平滑$ m $ m $ -m $ -lipschitz功能和$ l $ -Smooth功能的总和，而无需$ M $，$ L $，甚至目的是两个条款的总和。该速率正是每个项的个人复杂性类别的最佳收敛率的总和。更普遍地，我们表明，各种各样的Hölder平滑功能的总和不引入任何新的复杂性，最多需要尽可能多的迭代来分别最大程度地减少每个求和所需的迭代。还提供了强烈凸出和Hölder增长环境以及简单匹配的下限的扩展。

This work considers minimizing a sum of convex functions, each with potentially different structure ranging from nonsmooth to smooth, Lipschitz to non-Lipschitz. Nesterov's universal fast gradient method provides an optimal black-box first-order method for minimizing a single function that takes advantage of any continuity structure present without requiring prior knowledge. In this paper, we show that this landmark method (without modification) further adapts to heterogeneous sums. For example, it minimizes the sum of a nonsmooth $M$-Lipschitz function and an $L$-smooth function at a rate of $ O(M^2/ε^2 + \sqrt{L/ε}) $ without knowledge of $M$, $L$, or even that the objective was a sum of two terms. This rate is precisely the sum of the optimal convergence rates for each term's individual complexity class. More generally, we show that sums of varied Hölder smooth functions introduce no new complexities and require at most as many iterations as is needed for minimizing each summand separately. Extensions to strongly convex and Hölder growth settings as well as simple matching lower bounds are also provided.

下载PDF全文

下载文献需遵守相关版权规定

论文标题