论文标题
dpcrowd:实时众包数据的隐私权和沟通效率分散的统计估计
DPCrowd: Privacy-preserving and Communication-efficient Decentralized Statistical Estimation for Real-time Crowd-sourced Data
论文作者
论文摘要
在物联网(IoT)驱动的智能世界系统中,可以汇总来自多个分布式服务器的实时众群体数据库,以从较大的人群中提取动态统计信息,从而为我们的社会提供更可靠的知识。特别是,分散网络中的多个分布式服务器可以通过从其单独的数据库中传播统计信息来实现实时协作统计估计。尽管没有原始数据共享,但实时统计数据仍然可以揭示众包参与者的数据隐私。为了减轻隐私问题,虽然可以简单地实施传统的差异隐私(DP)机制,以在每个时间戳中扰动统计数据,而在每个维度上都可以独立进行统计数据,但这可能会造成实时和多维群众群体数据的巨大效果。此外,实时广播将为整个网络带来重要的间接开销。为了解决这些问题,我们提出了一种新颖的保护隐私和沟通有效的分散化统计估计算法(DPCROWD),该算法仅需要与单跳的邻居中间歇性共享DP受保护的参数,以实时群众群体中的时间相关性。然后,随着空间相关性的进一步考虑,我们开发了一种增强的算法DPCOWD+,以处理多维无限人群数据流。在几个数据集上进行了广泛的实验表明,我们提出的计划DPCOWD和DPCOWD+可以极大地超过现有方案,以提供准确和共识的估计,并具有严格的隐私保护和良好的沟通效率。
In Internet of Things (IoT) driven smart-world systems, real-time crowd-sourced databases from multiple distributed servers can be aggregated to extract dynamic statistics from a larger population, thus providing more reliable knowledge for our society. Particularly, multiple distributed servers in a decentralized network can realize real-time collaborative statistical estimation by disseminating statistics from their separate databases. Despite no raw data sharing, the real-time statistics could still expose the data privacy of crowd-sourcing participants. For mitigating the privacy concern, while traditional differential privacy (DP) mechanism can be simply implemented to perturb the statistics in each timestamp and independently for each dimension, this may suffer a great utility loss from the real-time and multi-dimensional crowd-sourced data. Also, the real-time broadcasting would bring significant overheads in the whole network. To tackle the issues, we propose a novel privacy-preserving and communication-efficient decentralized statistical estimation algorithm (DPCrowd), which only requires intermittently sharing the DP protected parameters with one-hop neighbors by exploiting the temporal correlations in real-time crowd-sourced data. Then, with further consideration of spatial correlations, we develop an enhanced algorithm, DPCrowd+, to deal with multi-dimensional infinite crowd-data streams. Extensive experiments on several datasets demonstrate that our proposed schemes DPCrowd and DPCrowd+ can significantly outperform existing schemes in providing accurate and consensus estimation with rigorous privacy protection and great communication efficiency.