霍格维尔！超过分布式的本地数据集，线性增加的迷你批量大小

论文标题

霍格维尔！超过分布式的本地数据集，线性增加的迷你批量大小

Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes

论文作者

van Dijk, Marten, Nguyen, Nhuong V., Nguyen, Toan N., Nguyen, Lam M., Tran-Dinh, Quoc, Nguyen, Phuong Ha

论文摘要

霍格维尔！实现异步随机梯度下降（SGD），其中多个线程并行访问一个包含训练数据，执行SGD迭代和更新共享状态的通用存储库，该状态代表共同学习的（全局）模型。我们考虑大数据分析以异构方式分布培训数据在本地数据集之间 - 我们希望将SGD计算移至本地数据所在的本地计算节点。这些本地SGD计算的结果由模拟Hogwild！的中央“聚合器”汇总。我们展示了本地计算节点如何开始选择小型迷你批量尺寸，以降低沟通成本（与聚合器的互动）。我们改善了最新文学作品，并显示出$ o（\ sqrt {k} $）通信回合，用于强烈凸出问题的异质数据，其中$ k $是所有本地计算节点中梯度计算的总数。对于我们的方案，我们证明了一个\ textit {tight}和新颖的非平凡收敛分析，用于{\ em异质}数据强烈凸出问题，该数据不使用许多现有出版物中所见的有界梯度假设。紧密度是我们证明收敛速率下限和上限的结果，这显示出恒定的因子差异。我们显示了针对偏置（即异质）和无偏见的本地数据集的普通凸和非凸问题的实验结果。

Hogwild! implements asynchronous Stochastic Gradient Descent (SGD) where multiple threads in parallel access a common repository containing training data, perform SGD iterations and update shared state that represents a jointly learned (global) model. We consider big data analysis where training data is distributed among local data sets in a heterogeneous way -- and we wish to move SGD computations to local compute nodes where local data resides. The results of these local SGD computations are aggregated by a central "aggregator" which mimics Hogwild!. We show how local compute nodes can start choosing small mini-batch sizes which increase to larger ones in order to reduce communication cost (round interaction with the aggregator). We improve state-of-the-art literature and show $O(\sqrt{K}$) communication rounds for heterogeneous data for strongly convex problems, where $K$ is the total number of gradient computations across all local compute nodes. For our scheme, we prove a \textit{tight} and novel non-trivial convergence analysis for strongly convex problems for {\em heterogeneous} data which does not use the bounded gradient assumption as seen in many existing publications. The tightness is a consequence of our proofs for lower and upper bounds of the convergence rate, which show a constant factor difference. We show experimental results for plain convex and non-convex problems for biased (i.e., heterogeneous) and unbiased local data sets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题