基于自适应梯度量化的异质边缘设备的沟通效率的联合学习

论文标题

基于自适应梯度量化的异质边缘设备的沟通效率的联合学习

Communication-Efficient Federated Learning for Heterogeneous Edge Devices Based on Adaptive Gradient Quantization

论文作者

Liu, Heting, He, Fang, Cao, Guohong

论文摘要

联合学习（FL）使地理分散的边缘设备（即客户端）可以学习全局模型而无需共享本地数据集，在该数据集中，每个客户端都会使用其本地数据执行梯度下降，并将梯度上传到中央服务器以更新全局模型。但是，佛罗里达州面对大规模的沟通架空是由于在每个训练回合中上传梯度而产生的。为了解决这个问题，大多数现有的研究都以固定和统一的量化为所有客户压缩梯度，由于不同回合的梯度规范不同，也没有寻求自适应量化，也没有利用客户的异质性加速FL。在本文中，我们提出了一种新型的自适应和异质梯度量化算法（ADAGQ），以最大程度地减少两个方面的壁式训练时间：i）自适应量化，以利用梯度规范的变化以调整每个训练回合的量化分辨率； ii）异质量化分配较低的量化解决方案以减慢客户的速度，以使他们的培训时间与其他客户保持一致，以减轻通信瓶颈，并更高的量化解决方案，以便快速客户实现更好的沟通效率和准确性的交易。与基线算法相比，基于各种模型和数据集的评估验证了ADAGQ的好处，将总训练时间降低了52.1％（例如，FedAvg，QSGD）。

Federated learning (FL) enables geographically dispersed edge devices (i.e., clients) to learn a global model without sharing the local datasets, where each client performs gradient descent with its local data and uploads the gradients to a central server to update the global model. However, FL faces massive communication overhead resulted from uploading the gradients in each training round. To address this problem, most existing research compresses the gradients with fixed and unified quantization for all the clients, which neither seeks adaptive quantization due to the varying gradient norms at different rounds, nor exploits the heterogeneity of the clients to accelerate FL. In this paper, we propose a novel adaptive and heterogeneous gradient quantization algorithm (AdaGQ) for FL to minimize the wall-clock training time from two aspects: i) adaptive quantization which exploits the change of gradient norm to adjust the quantization resolution in each training round; and ii) heterogeneous quantization which assigns lower quantization resolution to slow clients to align their training time with other clients to mitigate the communication bottleneck, and higher quantization resolution to fast clients to achieve a better communication efficiency and accuracy tradeoff. Evaluations based on various models and datasets validate the benefits of AdaGQ, reducing the total training time by up to 52.1% compared to baseline algorithms (e.g., FedAvg, QSGD).

下载PDF全文

下载文献需遵守相关版权规定

论文标题