使用公共机制自适应地回答私人线性查询

论文标题

使用公共机制自适应地回答私人线性查询

Answering Private Linear Queries Adaptively using the Common Mechanism

论文作者

Xiao, Yingtai, Wang, Guanhong, Zhang, Danfeng, Kifer, Daniel

论文摘要

在通过隐私过滤器分析机密数据时，数据科学家通常需要决定哪些查询最能支持其预期的分析。例如，分析师可能希望在机构M1生成的数据集中研究嘈杂的双向边缘。但是，如果数据相对稀疏，则分析师可能会选择检查由M2机制产生的嘈杂的单向边缘。由于选择是否使用M1还是M2是数据依赖性的，因此典型的私人工作流程是首先将隐私损失预算RHO分为两个部分：Rho1和Rho2，然后使用第一部分RHO1来确定要使用的机制，以及其余的Rho2，以从所选机制中获得嘈杂的答案。从某种意义上说，第一步似乎很浪费，因为它剥夺了本可以用来使查询答案更准确的隐私损失预算的一部分。在本文中，我们考虑了是否可以在不浪费任何隐私损失预算的情况下执行M1和M2之间的选择。对于线性查询，我们提出了一种将M1和M2分解为三个部分的方法：（1）一种捕获其共享信息的机制，（2）一种机制M1'捕获特定于M1的信息，（3）一种机制M2'捕获特定于M2的信息。一起运行M*和M1'完全等同于运行M1（在查询答案准确性和总隐私成本方面，RHO）。同样，一起运行M*和M2'完全等同于运行M2。由于无论如何都将使用M*，因此分析师可以使用其输出来决定是否随后运行M1'（从而重现M1支持的分析）或M2'（重新创建M2支持的分析），而不会浪费隐私损失预算。

When analyzing confidential data through a privacy filter, a data scientist often needs to decide which queries will best support their intended analysis. For example, an analyst may wish to study noisy two-way marginals in a dataset produced by a mechanism M1. But, if the data are relatively sparse, the analyst may choose to examine noisy one-way marginals, produced by a mechanism M2 instead. Since the choice of whether to use M1 or M2 is data-dependent, a typical differentially private workflow is to first split the privacy loss budget rho into two parts: rho1 and rho2, then use the first part rho1 to determine which mechanism to use, and the remainder rho2 to obtain noisy answers from the chosen mechanism. In a sense, the first step seems wasteful because it takes away part of the privacy loss budget that could have been used to make the query answers more accurate. In this paper, we consider the question of whether the choice between M1 and M2 can be performed without wasting any privacy loss budget. For linear queries, we propose a method for decomposing M1 and M2 into three parts: (1) a mechanism M* that captures their shared information, (2) a mechanism M1' that captures information that is specific to M1, (3) a mechanism M2' that captures information that is specific to M2. Running M* and M1' together is completely equivalent to running M1 (both in terms of query answer accuracy and total privacy cost rho). Similarly, running M* and M2' together is completely equivalent to running M2. Since M* will be used no matter what, the analyst can use its output to decide whether to subsequently run M1'(thus recreating the analysis supported by M1) or M2'(recreating the analysis supported by M2), without wasting privacy loss budget.

下载PDF全文

下载文献需遵守相关版权规定

论文标题