机器不学习联合群集

论文标题

机器不学习联合群集

Machine Unlearning of Federated Clusters

论文作者

Pan, Chao, Sima, Jin, Prakash, Saurav, Rana, Vishal, Milenkovic, Olgica

论文摘要

联合聚类（FC）是一个无监督的学习问题，在许多实际应用中出现，包括个性化的建议和医疗保健系统。通过采用最近的法律，确保了“被遗忘的权利”，FC方法的机器学习问题已变得非常重要。我们首次介绍了FC的机器学习问题，并为定制的安全FC框架提出了有效的学习机制。我们的FC框架采用了我们显示的特殊初始化程序，非常适合学习。为了保护客户数据隐私，我们开发了安全的压缩多组合聚合（SCMA）框架，该框架解决了稀疏的安全联合学习（FL）问题，以及集群过程中遇到的问题以及更一般的问题。为了同时促进沟通复杂性和秘密共享协议，我们将编码的芦苇 - 固体编码与特殊评估点集成到我们的SCMA管道中，并证明客户端通信成本在向量维度上是对数的。此外，为了证明我们未学习机制的好处，而不是完整的再培训，我们为我们的方法的学习表现提供了理论分析。仿真结果表明，与先前报道的FC基准相比，新的FC框架在群集大小高度不平衡时表现出了出色的聚类性能。与在本地和全球范围内完全重新培训K-Means ++在每个删除请求中相比，我们的未学习程序在七个数据集中的平均加速约为84倍。我们针对该方法的实现可在https://github.com/thupchnsky/mufc上获得。

Federated clustering (FC) is an unsupervised learning problem that arises in a number of practical applications, including personalized recommender and healthcare systems. With the adoption of recent laws ensuring the "right to be forgotten", the problem of machine unlearning for FC methods has become of significant importance. We introduce, for the first time, the problem of machine unlearning for FC, and propose an efficient unlearning mechanism for a customized secure FC framework. Our FC framework utilizes special initialization procedures that we show are well-suited for unlearning. To protect client data privacy, we develop the secure compressed multiset aggregation (SCMA) framework that addresses sparse secure federated learning (FL) problems encountered during clustering as well as more general problems. To simultaneously facilitate low communication complexity and secret sharing protocols, we integrate Reed-Solomon encoding with special evaluation points into our SCMA pipeline, and prove that the client communication cost is logarithmic in the vector dimension. Additionally, to demonstrate the benefits of our unlearning mechanism over complete retraining, we provide a theoretical analysis for the unlearning performance of our approach. Simulation results show that the new FC framework exhibits superior clustering performance compared to previously reported FC baselines when the cluster sizes are highly imbalanced. Compared to completely retraining K-means++ locally and globally for each removal request, our unlearning procedure offers an average speed-up of roughly 84x across seven datasets. Our implementation for the proposed method is available at https://github.com/thupchnsky/mufc.

下载PDF全文

下载文献需遵守相关版权规定

论文标题