随机方差减少牛顿：大批批量加速有限和最小化

论文标题

随机方差减少牛顿：大批批量加速有限和最小化

Stochastic Variance-Reduced Newton: Accelerating Finite-Sum Minimization with Large Batches

论文作者

Dereziński, Michał

论文摘要

事实证明，降低随机方差可有效加速一阶算法，以求解凸有限的和诸如经验风险最小化之类的凸有限和优化任务。合并二阶信息已证明有助于进一步提高这些一阶方法的性能。但是，对于使用降低方差来加速流行的随机二阶方法（例如牛顿）的好处，相对较少了解。为了解决这个问题，我们提出了随机方差减少的牛顿（SVRN），这是一种有限的最小化算法，可证明将现有随机牛顿的方法从$ O（α\ log（1/ε）$到$ o \ big（\ frac {\ frac {\ frac {\ log（log log（1/〜log）），以$ o（α\ log（n））$的倍数为倍，其中$ n $是总和成分的数量，而$α$是Hessian估算中的近似因素。令人惊讶的是，这种加速度的越大，数据尺寸$ n $越大，这是SVRN的独特属性。我们的算法保留了牛顿型方法的关键优势，例如易于并行的大批量操作和简单的单元步长。我们使用SVRN来加速牛顿的亚采样和迭代性Hessian草图算法，并表明它与具有方差降低的流行一阶方法相比有利。

Stochastic variance reduction has proven effective at accelerating first-order algorithms for solving convex finite-sum optimization tasks such as empirical risk minimization. Incorporating second-order information has proven helpful in further improving the performance of these first-order methods. Yet, comparatively little is known about the benefits of using variance reduction to accelerate popular stochastic second-order methods such as Subsampled Newton. To address this, we propose Stochastic Variance-Reduced Newton (SVRN), a finite-sum minimization algorithm that provably accelerates existing stochastic Newton methods from $O(α\log(1/ε))$ to $O\big(\frac{\log(1/ε)}{\log(n)}\big)$ passes over the data, i.e., by a factor of $O(α\log(n))$, where $n$ is the number of sum components and $α$ is the approximation factor in the Hessian estimate. Surprisingly, this acceleration gets more significant the larger the data size $n$, which is a unique property of SVRN. Our algorithm retains the key advantages of Newton-type methods, such as easily parallelizable large-batch operations and a simple unit step size. We use SVRN to accelerate Subsampled Newton and Iterative Hessian Sketch algorithms, and show that it compares favorably to popular first-order methods with variance~reduction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题