Swift：通过无候补模型通信快速分散的联邦学习

论文标题

Swift：通过无候补模型通信快速分散的联邦学习

SWIFT: Rapid Decentralized Federated Learning via Wait-Free Model Communication

论文作者

Bornstein, Marco, Rabbani, Tahseen, Wang, Evan, Bedi, Amrit Singh, Huang, Furong

论文摘要

分散的联合学习（FL）设置通过利用客户组通过本地化培训和模型/梯度共享来协作培训模型，从而避免了潜在不可靠或不信任的中央主机的角色。大多数现有的分散FL算法都需要同步速度取决于最慢的客户端的客户模型。在这项工作中，我们提出了Swift：一种新颖的无候补分散的FL算法，使客户可以自行进行培训。从理论上讲，我们证明SWIFT与并行随机梯度下降的金标准迭代收敛率$ \ MATHCAL {O}（1/\ sqrt {t}）$，用于凸和非convex平滑优化（总迭代$ t $）。此外，我们为IID和非IID设置提供了理论结果，而对于其他异步分散的FL算法所需的慢速客户却没有任何有限的假设。尽管Swift与其他最先进的（SOTA）并行随机算法相对于$ t $达到了相同的迭代收敛速率，但由于其无候补的结构，它相对于运行时的收敛速度更快。我们的实验结果表明，由于每个时期的通信时间大幅度减少，Swift的运行时间降低了，与同步同步相比，该时间级别降低了数量级。此外，Swift在IID和非IID数据设置上产生图像分类的损失水平，比现有SOTA算法快50％以上。

The decentralized Federated Learning (FL) setting avoids the role of a potentially unreliable or untrustworthy central host by utilizing groups of clients to collaboratively train a model via localized training and model/gradient sharing. Most existing decentralized FL algorithms require synchronization of client models where the speed of synchronization depends upon the slowest client. In this work, we propose SWIFT: a novel wait-free decentralized FL algorithm that allows clients to conduct training at their own speed. Theoretically, we prove that SWIFT matches the gold-standard iteration convergence rate $\mathcal{O}(1/\sqrt{T})$ of parallel stochastic gradient descent for convex and non-convex smooth optimization (total iterations $T$). Furthermore, we provide theoretical results for IID and non-IID settings without any bounded-delay assumption for slow clients which is required by other asynchronous decentralized FL algorithms. Although SWIFT achieves the same iteration convergence rate with respect to $T$ as other state-of-the-art (SOTA) parallel stochastic algorithms, it converges faster with respect to run-time due to its wait-free structure. Our experimental results demonstrate that SWIFT's run-time is reduced due to a large reduction in communication time per epoch, which falls by an order of magnitude compared to synchronous counterparts. Furthermore, SWIFT produces loss levels for image classification, over IID and non-IID data settings, upwards of 50% faster than existing SOTA algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题