论文标题
有效使用非IID分散数据的联合自适应梯度方法
Effective Federated Adaptive Gradient Methods with Non-IID Decentralized Data
论文作者
论文摘要
联合学习允许大量的边缘计算设备可以协作学习全局模型而无需数据共享。在非IID和不平衡数据下进行部分设备参与的分析反映了更多的现实。在这项工作中,我们提出了使用一阶和二阶Monma的联合梯度方法的联合学习版本(联合AGM),以减轻设备中数据总体之间的相似性差异引起的概括性能恶化。为了进一步提高测试性能,我们比较了自适应学习率的几种校准方案,包括由$ε$,$ p $ -ADAM校准的标准ADAM,以及通过激活功能进行校准的标准。我们的分析提供了第一组理论结果,该理论结果是在非IID和不平衡的数据设置下,提出的(校准)联合AGM收敛到一阶固定点,以进行非convex优化。我们进行了广泛的实验,将这些联合学习方法与最先进的FedAvg,FedMomentum和脚手架进行了比较,并评估了与当前联合学习方法相比,AGM的不同校准方案以及AGM的优势。
Federated learning allows loads of edge computing devices to collaboratively learn a global model without data sharing. The analysis with partial device participation under non-IID and unbalanced data reflects more reality. In this work, we propose federated learning versions of adaptive gradient methods - Federated AGMs - which employ both the first-order and second-order momenta, to alleviate generalization performance deterioration caused by dissimilarity of data population among devices. To further improve the test performance, we compare several schemes of calibration for the adaptive learning rate, including the standard Adam calibrated by $ε$, $p$-Adam, and one calibrated by an activation function. Our analysis provides the first set of theoretical results that the proposed (calibrated) Federated AGMs converge to a first-order stationary point under non-IID and unbalanced data settings for nonconvex optimization. We perform extensive experiments to compare these federated learning methods with the state-of-the-art FedAvg, FedMomentum and SCAFFOLD and to assess the different calibration schemes and the advantages of AGMs over the current federated learning methods.