论文标题
控制抽样方案中的隐私损失:分层和群集抽样的分析
Controlling Privacy Loss in Sampling Schemes: an Analysis of Stratified and Cluster Sampling
论文作者
论文摘要
抽样方案是统计,调查设计和算法设计中的基本工具。差异隐私的基本结果是,在一个简单的人口样本上运行的差异私人机制比在整个人群上运行的同一算法提供了更强的隐私保证。但是,实际上,采样设计通常比先前工作中解决的简单,独立的采样方案更为复杂。在这项工作中,我们将隐私放大结果的研究扩展到了更复杂的数据依赖性抽样方案。我们发现,这些抽样方案不仅通常无法放大隐私,而且实际上可以导致隐私降级。我们分析了普遍存在的群集采样和分层采样范式的隐私含义,并提供了对更通用采样设计的研究的一些见解。
Sampling schemes are fundamental tools in statistics, survey design, and algorithm design. A fundamental result in differential privacy is that a differentially private mechanism run on a simple random sample of a population provides stronger privacy guarantees than the same algorithm run on the entire population. However, in practice, sampling designs are often more complex than the simple, data-independent sampling schemes that are addressed in prior work. In this work, we extend the study of privacy amplification results to more complex, data-dependent sampling schemes. We find that not only do these sampling schemes often fail to amplify privacy, they can actually result in privacy degradation. We analyze the privacy implications of the pervasive cluster sampling and stratified sampling paradigms, as well as provide some insight into the study of more general sampling designs.