可以通过梯度平均策略加快随机梯度方法的收敛速率（1/k^2）$吗？

论文标题

可以通过梯度平均策略加快随机梯度方法的收敛速率（1/k^2）$吗？

Can speed up the convergence rate of stochastic gradient methods to $\mathcal{O}(1/k^2)$ by a gradient averaging strategy?

论文作者

Xu, Xin, Luo, Xiaopeng

论文摘要

在本文中，我们考虑了一个问题，即是否有可能采用梯度平均策略来提高均方根收敛速率而不会增加存储。我们的分析表明，一个积极的答案需要适当的平均策略和迭代，以满足差异主导条件。作为一个有趣的事实，我们表明，如果我们定义的迭代差异始终在随机梯度迭代中始终是主导的，那么提议的梯度平均策略可以将收敛速率$ \ MATHCAL {O}（1/K）$提高至$ \ \ \ \ \ \ \ \ \ \ m natercal {o}（O}（1/k^2），以实现强大的对象的conve对象。该结论表明，我们应该如何控制随机梯度迭代以提高收敛速度。

In this paper we consider the question of whether it is possible to apply a gradient averaging strategy to improve on the sublinear convergence rates without any increase in storage. Our analysis reveals that a positive answer requires an appropriate averaging strategy and iterations that satisfy the variance dominant condition. As an interesting fact, we show that if the iterative variance we defined is always dominant even a little bit in the stochastic gradient iterations, the proposed gradient averaging strategy can increase the convergence rate $\mathcal{O}(1/k)$ to $\mathcal{O}(1/k^2)$ in probability for the strongly convex objectives with Lipschitz gradients. This conclusion suggests how we should control the stochastic gradient iterations to improve the rate of convergence.

下载PDF全文

下载文献需遵守相关版权规定

论文标题