论文标题

霍格维尔!超过分布式的本地数据集,线性增加的迷你批量大小

Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes

论文作者

van Dijk, Marten, Nguyen, Nhuong V., Nguyen, Toan N., Nguyen, Lam M., Tran-Dinh, Quoc, Nguyen, Phuong Ha

论文摘要

霍格维尔!实现异步随机梯度下降(SGD),其中多个线程并行访问一个包含训练数据,执行SGD迭代和更新共享状态的通用存储库,该状态代表共同学习的(全局)模型。我们考虑大数据分析以异构方式分布培训数据在本地数据集之间 - 我们希望将SGD计算移至本地数据所在的本地计算节点。这些本地SGD计算的结果由模拟Hogwild!的中央“聚合器”汇总。我们展示了本地计算节点如何开始选择小型迷你批量尺寸,以降低沟通成本(与聚合器的互动)。我们改善了最新文学作品,并显示出$ o(\ sqrt {k} $)通信回合,用于强烈凸出问题的异质数据,其中$ k $是所有本地计算节点中梯度计算的总数。对于我们的方案,我们证明了一个\ textit {tight}和新颖的非平凡收敛分析,用于{\ em异质}数据强烈凸出问题,该数据不使用许多现有出版物中所见的有界梯度假设。紧密度是我们证明收敛速率下限和上限的结果,这显示出恒定的因子差异。我们显示了针对偏置(即异质)和无偏见的本地数据集的普通凸和非凸问题的实验结果。

Hogwild! implements asynchronous Stochastic Gradient Descent (SGD) where multiple threads in parallel access a common repository containing training data, perform SGD iterations and update shared state that represents a jointly learned (global) model. We consider big data analysis where training data is distributed among local data sets in a heterogeneous way -- and we wish to move SGD computations to local compute nodes where local data resides. The results of these local SGD computations are aggregated by a central "aggregator" which mimics Hogwild!. We show how local compute nodes can start choosing small mini-batch sizes which increase to larger ones in order to reduce communication cost (round interaction with the aggregator). We improve state-of-the-art literature and show $O(\sqrt{K}$) communication rounds for heterogeneous data for strongly convex problems, where $K$ is the total number of gradient computations across all local compute nodes. For our scheme, we prove a \textit{tight} and novel non-trivial convergence analysis for strongly convex problems for {\em heterogeneous} data which does not use the bounded gradient assumption as seen in many existing publications. The tightness is a consequence of our proofs for lower and upper bounds of the convergence rate, which show a constant factor difference. We show experimental results for plain convex and non-convex problems for biased (i.e., heterogeneous) and unbiased local data sets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源