论文标题
通过稀疏和自适应同伴选择,沟通高效的分散学习
Communication-Efficient Decentralized Learning with Sparsification and Adaptive Peer Selection
论文作者
论文摘要
分布式学习技术(例如联合学习)使多个工人能够一起培训机器学习模型,以减少整个培训时间。但是,当前的分布式培训算法(集中式或分散式)在多个低型宽度工人(同时在集中式体系结构下的服务器上)遭受了通信瓶颈的影响。尽管分散算法的沟通复杂性通常比集中式同行具有低的沟通复杂性,但对于网络带宽较低的工人来说,它们仍然遭受沟通瓶颈的困扰。为了解决沟通问题,同时能够保留融合性能,我们引入了一种具有以下关键特征的新颖的分散培训算法:1)它不需要参数服务器在培训过程中维护模型,从而避免了任何单个同伴的通信压力。 2)每个工人只需要与高度压缩的模型在每个通信回合中与一个同行进行通信,这可以大大减少工人的通信流量。从理论上讲,我们证明我们的稀疏算法仍然保留收敛属性。 3)每个工人在不同的通信回合中动态选择其对等,以更好地利用带宽资源。与7种现有方法相比,我们对32名工人进行了32名工人的卷积神经网络进行实验,以验证我们提出的算法的有效性。实验结果表明,我们的算法大大降低了通信流量,并且通常选择相对较高的带宽同行。
Distributed learning techniques such as federated learning have enabled multiple workers to train machine learning models together to reduce the overall training time. However, current distributed training algorithms (centralized or decentralized) suffer from the communication bottleneck on multiple low-bandwidth workers (also on the server under the centralized architecture). Although decentralized algorithms generally have lower communication complexity than the centralized counterpart, they still suffer from the communication bottleneck for workers with low network bandwidth. To deal with the communication problem while being able to preserve the convergence performance, we introduce a novel decentralized training algorithm with the following key features: 1) It does not require a parameter server to maintain the model during training, which avoids the communication pressure on any single peer. 2) Each worker only needs to communicate with a single peer at each communication round with a highly compressed model, which can significantly reduce the communication traffic on the worker. We theoretically prove that our sparsification algorithm still preserves convergence properties. 3) Each worker dynamically selects its peer at different communication rounds to better utilize the bandwidth resources. We conduct experiments with convolutional neural networks on 32 workers to verify the effectiveness of our proposed algorithm compared to seven existing methods. Experimental results show that our algorithm significantly reduces the communication traffic and generally select relatively high bandwidth peers.