通过工业物联网中的小组客户选择，数据异质性 - 持企业的联盟学习

论文标题

通过工业物联网中的小组客户选择，数据异质性 - 持企业的联盟学习

Data Heterogeneity-Robust Federated Learning via Group Client Selection in Industrial IoT

论文作者

Li, Zonghang, He, Yihong, Yu, Hongfang, Kang, Jiawen, Li, Xiaoping, Xu, Zenglin, Niyato, Dusit

论文摘要

如今，工业互联网（IIOT）在工业4.0中发挥了不可或缺的作用，并为工业情报产生了大量数据。这些数据位于现代工厂中的分散设备上。为了保护工业数据的机密性，引入了联邦学习（FL），以协作培训共享的机器学习模型。但是，不同设备收集的本地数据在类别分配和降低工业FL绩效方面偏向。这项挑战已在移动边缘进行了广泛的研究，但是他们忽略了工厂设备的快速变化的流数据和聚类性质，更认真地，它们可能威胁到数据安全。在本文中，我们提出了Fedgs，这是5G授权行业的等级云边缘端FL框架，以改善非I.I.D的工业FL绩效。数据。 FedGS利用自然聚集的工厂设备，使用基于梯度的二进制排列算法（GBP-CS）在每个工厂内选择一个设备的子集，并构建参加FL培训的同质超级节点。然后，我们提出了一个复合步骤同步协议，以协调这些超级节点内部和中间的训练过程，这表明对数据异质性表现出极大的鲁棒性。所提出的方法是时间效率的，可以适应动态环境，而无需在风险操纵中揭示机密的工业数据。我们证明，FedGS比FedAvg具有更好的融合性能，并提供了轻松的条件，在该条件下，FedGS的沟通能力更高。广泛的实验表明，FedGS将准确性提高了3.5％，并将训练赛平均降低了59％，从而证实了其对非I.I.D的卓越有效性和效率。数据。

Nowadays, the industrial Internet of Things (IIoT) has played an integral role in Industry 4.0 and produced massive amounts of data for industrial intelligence. These data locate on decentralized devices in modern factories. To protect the confidentiality of industrial data, federated learning (FL) was introduced to collaboratively train shared machine learning models. However, the local data collected by different devices skew in class distribution and degrade industrial FL performance. This challenge has been widely studied at the mobile edge, but they ignored the rapidly changing streaming data and clustering nature of factory devices, and more seriously, they may threaten data security. In this paper, we propose FedGS, which is a hierarchical cloud-edge-end FL framework for 5G empowered industries, to improve industrial FL performance on non-i.i.d. data. Taking advantage of naturally clustered factory devices, FedGS uses a gradient-based binary permutation algorithm (GBP-CS) to select a subset of devices within each factory and build homogeneous super nodes participating in FL training. Then, we propose a compound-step synchronization protocol to coordinate the training process within and among these super nodes, which shows great robustness against data heterogeneity. The proposed methods are time-efficient and can adapt to dynamic environments, without exposing confidential industrial data in risky manipulation. We prove that FedGS has better convergence performance than FedAvg and give a relaxed condition under which FedGS is more communication-efficient. Extensive experiments show that FedGS improves accuracy by 3.5% and reduces training rounds by 59% on average, confirming its superior effectiveness and efficiency on non-i.i.d. data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题