DP-PCA：统计上最佳和差异性私有PCA

论文标题

DP-PCA：统计上最佳和差异性私有PCA

DP-PCA: Statistically Optimal and Differentially Private PCA

论文作者

Liu, Xiyang, Kong, Weihao, Jain, Prateek, Oh, Sewoong

论文摘要

我们研究了从$ n $ i.i.d的计算主要组件的规范统计任务。尽管在文献中进行了广泛的研究，但现有的解决方案在两个关键方面缺乏：（$ i $）即使对于高斯数据，现有的私人算法也需要$ n $的样本$ n $与$ d $，即$ n =ω（$ n =ω（d^{3/2}}）$ suble-linearearthe，以获取$ nor-private $ n $ n $ n $ n = o（o）o（o）o（o）o（o）o（o）o（o）o（o）o（o）o（o）o（o）o（o）即使每个数据点的随机性任意较小，技术也会遭受非变化误差的影响。我们提出了DP-PCA，这是一种克服这两个局限性的单通算法。它基于依赖{\ em私有平均值估计}的私人Minibatch梯度上升方法，该方法通过适应给定的渐变的方差来确保隐私所需的最小噪声。对于次高斯数据，即使对于$ n = \ tilde o（d）$，我们也提供几乎最佳的统计错误率。此外，我们提供了一个下界，表明次高斯风格的假设对于获得最佳错误率是必要的。

We study the canonical statistical task of computing the principal component from $n$ i.i.d.~data in $d$ dimensions under $(\varepsilon,δ)$-differential privacy. Although extensively studied in literature, existing solutions fall short on two key aspects: ($i$) even for Gaussian data, existing private algorithms require the number of samples $n$ to scale super-linearly with $d$, i.e., $n=Ω(d^{3/2})$, to obtain non-trivial results while non-private PCA requires only $n=O(d)$, and ($ii$) existing techniques suffer from a non-vanishing error even when the randomness in each data point is arbitrarily small. We propose DP-PCA, which is a single-pass algorithm that overcomes both limitations. It is based on a private minibatch gradient ascent method that relies on {\em private mean estimation}, which adds minimal noise required to ensure privacy by adapting to the variance of a given minibatch of gradients. For sub-Gaussian data, we provide nearly optimal statistical error rates even for $n=\tilde O(d)$. Furthermore, we provide a lower bound showing that sub-Gaussian style assumption is necessary in obtaining the optimal error rate.

下载PDF全文

下载文献需遵守相关版权规定

论文标题