论文标题

本地SGD比Minibatch SGD好吗?

Is Local SGD Better than Minibatch SGD?

论文作者

Woodworth, Blake, Patel, Kumar Kshitij, Stich, Sebastian U., Dai, Zhen, Bullins, Brian, McMahan, H. Brendan, Shamir, Ohad, Srebro, Nathan

论文摘要

我们研究局部SGD(也称为平行SGD和联合平均),这是一种天然且常用的随机分布式优化方法。目前缺乏其理论基础,我们强调了凸设置中所有现有的错误保证如何以简单的基线Minibatch SGD为主导。 (1)对于二次目标,我们证明本地SGD严格主导了Minibatch SGD,并且加速的本地SGD对四次统治是最小的; (2)对于一般凸目标,我们提供了第一个保证,即至少有时会改善Minibatch SGD; (3)我们表明,确实通过对本地SGD的性能呈现低于Minibatch SGD保证的局部SGD的下限,实际上本地SGD并不能主导Minibatch SGD。

We study local SGD (also known as parallel SGD and federated averaging), a natural and frequently used stochastic distributed optimization method. Its theoretical foundations are currently lacking and we highlight how all existing error guarantees in the convex setting are dominated by a simple baseline, minibatch SGD. (1) For quadratic objectives we prove that local SGD strictly dominates minibatch SGD and that accelerated local SGD is minimax optimal for quadratics; (2) For general convex objectives we provide the first guarantee that at least sometimes improves over minibatch SGD; (3) We show that indeed local SGD does not dominate minibatch SGD by presenting a lower bound on the performance of local SGD that is worse than the minibatch SGD guarantee.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源