论文标题
私人高维假设检验
Private High-Dimensional Hypothesis Testing
论文作者
论文摘要
我们为高维分布的身份测试提供了改进的差异私有算法。具体来说,对于具有已知协方差$σ$的$ d $二维高斯分布,我们可以测试分布是否来自$ \ nathcal {n}(μ^*,σ)$,对于某些固定的$μ^*$还是来自某些$ \ natercal {n}(μ,σ)$ $ $ $ $ aint $ nation $ ainty $^n n n $ ain σ)$带有$(\ varepsilon,0)$ - 微分隐私,仅使用\ [\ tilde {o} \ left(\ frac {d^{1/2}}} {α^2} + \ frac {d} + \ frac {d^{1/3}} \ varepsilon^{2/3}}} + \ frac {1} {α\ cdot \ varepsilon} \ right)\] \] 样品如果允许该算法计算效率低下,并且仅\ [\ tilde {o} \ left(\ frac {d^{1/2}} {α^2} + \ \ \ \\ frac {d^{d^{1/4}}} 用于计算有效算法的样品。我们还提供了一个匹配的下限,表明我们的计算效率低下的算法具有最佳的样品复杂性。我们还将算法扩展到各种相关问题,包括对具有有限但未知协方差的高斯人的平均测试,对$ \ { - 1,1,1,1 \}^d $的产品分布进行均匀性测试以及耐受性测试。 我们的结果改善了Canonne等人的先前最佳工作。 $ o \ left(\ frac {\ sqrt {d}} {α^2} \ right)$在许多标准参数设置中。 此外,我们的结果表明,令人惊讶的是,可以用$ d $二维高斯人进行的私人身份测试可以用少于$ d $ d $ d $ \ cite {acharyasz18}的离散分布的私人身份测试来完成样本,从而重新构想了一个猜测的〜〜 \ cite {canonnekmuzmuzmuzmuzmuzmuzmuzmuzmuzmuzmuzmuzmuzmuzmuzmuzmuzmuzmuzmuzmuzmuzmuzmuzmuzmuzmuzmuzmuzmuzmuzmuzmuzmuzmuz20}。
We provide improved differentially private algorithms for identity testing of high-dimensional distributions. Specifically, for $d$-dimensional Gaussian distributions with known covariance $Σ$, we can test whether the distribution comes from $\mathcal{N}(μ^*, Σ)$ for some fixed $μ^*$ or from some $\mathcal{N}(μ, Σ)$ with total variation distance at least $α$ from $\mathcal{N}(μ^*, Σ)$ with $(\varepsilon, 0)$-differential privacy, using only \[\tilde{O}\left(\frac{d^{1/2}}{α^2} + \frac{d^{1/3}}{α^{4/3} \cdot \varepsilon^{2/3}} + \frac{1}{α\cdot \varepsilon}\right)\] samples if the algorithm is allowed to be computationally inefficient, and only \[\tilde{O}\left(\frac{d^{1/2}}{α^2} + \frac{d^{1/4}}{α\cdot \varepsilon}\right)\] samples for a computationally efficient algorithm. We also provide a matching lower bound showing that our computationally inefficient algorithm has optimal sample complexity. We also extend our algorithms to various related problems, including mean testing of Gaussians with bounded but unknown covariance, uniformity testing of product distributions over $\{-1, 1\}^d$, and tolerant testing. Our results improve over the previous best work of Canonne et al.~\cite{CanonneKMUZ20} for both computationally efficient and inefficient algorithms, and even our computationally efficient algorithm matches the optimal \emph{non-private} sample complexity of $O\left(\frac{\sqrt{d}}{α^2}\right)$ in many standard parameter settings. In addition, our results show that, surprisingly, private identity testing of $d$-dimensional Gaussians can be done with fewer samples than private identity testing of discrete distributions over a domain of size $d$ \cite{AcharyaSZ18}, which refutes a conjectured lower bound of~\cite{CanonneKMUZ20}.