论文标题
通过异质无线网络进行动态分布式模型培训的平行连续学习
Parallel Successive Learning for Dynamic Distributed Model Training over Heterogeneous Wireless Networks
论文作者
论文摘要
Federated Learning(FedL)已成为一种流行技术,用于通过迭代的本地更新(在设备)和全球聚合(在服务器上)在一组无线设备上分发模型培训。在本文中,我们开发了并行连续的学习(PSL),该学习将FedL体系结构沿三个维度扩展:(i)网络,允许通过设备对设备(D2D)通信之间的设备之间的分散合作。 (ii)在三个层面上解释的异质性:(ii-a)学习:PSL考虑了在设备上具有不同迷你批量不同的随机梯度下降迭代迭代的异质数量; (II-B)数据:PSL假定具有数据到达和出发的动态环境,其中本地数据集的分布会随着时间的推移而发展,并通过新的指标捕获了模型/概念漂移的新指标。 (II-C)设备:PSL考虑具有不同计算和通信功能的设备。 (iii)接近,设备彼此之间的距离不同,并且接入点。 PSL考虑了一种现实的情况,在这些方案中,在它们之间进行了整体聚合,以提高它们的闲置时间,并将数据分散和模型分散与局部模型凝结结合到FedL中。我们的分析阐明了冷与热模型的概念,并在分布式机器学习中模型惯性。然后,我们提出了网络感知的动态模型跟踪,以优化模型学习与资源效率折衷,我们证明这是NP障碍的障碍障碍编程问题。我们最终通过提出一般优化求解器来解决此问题。我们的数值结果揭示了有关全局聚合,模型/概念漂移和D2D合作配置之间空闲时间之间相互依存的新发现。
Federated learning (FedL) has emerged as a popular technique for distributing model training over a set of wireless devices, via iterative local updates (at devices) and global aggregations (at the server). In this paper, we develop parallel successive learning (PSL), which expands the FedL architecture along three dimensions: (i) Network, allowing decentralized cooperation among the devices via device-to-device (D2D) communications. (ii) Heterogeneity, interpreted at three levels: (ii-a) Learning: PSL considers heterogeneous number of stochastic gradient descent iterations with different mini-batch sizes at the devices; (ii-b) Data: PSL presumes a dynamic environment with data arrival and departure, where the distributions of local datasets evolve over time, captured via a new metric for model/concept drift. (ii-c) Device: PSL considers devices with different computation and communication capabilities. (iii) Proximity, where devices have different distances to each other and the access point. PSL considers the realistic scenario where global aggregations are conducted with idle times in-between them for resource efficiency improvements, and incorporates data dispersion and model dispersion with local model condensation into FedL. Our analysis sheds light on the notion of cold vs. warmed up models, and model inertia in distributed machine learning. We then propose network-aware dynamic model tracking to optimize the model learning vs. resource efficiency tradeoff, which we show is an NP-hard signomial programming problem. We finally solve this problem through proposing a general optimization solver. Our numerical results reveal new findings on the interdependencies between the idle times in-between the global aggregations, model/concept drift, and D2D cooperation configuration.