（几乎）通过自适应剪辑的最佳私人线性回归

论文标题

（几乎）通过自适应剪辑的最佳私人线性回归

(Nearly) Optimal Private Linear Regression via Adaptive Clipping

论文作者

Varshney, Prateek, Thakurta, Abhradeep, Jain, Prateek

论文摘要

我们研究了差异私有线性回归的问题，其中每个数据点都是从固定的次高斯样式分布中采样的。我们提出和分析了一种一通小批量的随机梯度下降法（DP-AMBSSGD），其中每次迭代中的点都在没有替换的情况下进行采样。为DP添加了噪声，但噪声标准偏差是在线估计的。与具有亚最佳误差界的现有$（ε，δ）$ -DP技术相比，DP-AMBSSGD能够根据关键参数（例如dimensionalityality $ d $ d $ docts $ n $）以及在观察中噪声的标准偏差$σ$提供几乎最佳的错误界限。例如，当对$ d $二维的协变量进行采样时从正常分布中，然后由于隐私而导致的DP-AMBSSGD的过剩误差为$ \ frac {σ^2 d} {n}（1+ \ frac {d} {d} {ε^2 n}）$，即相比之下，此设置中现有有效方法的错误范围为：$ \ MATHCAL {O} \ big（\ frac {d^3} {ε^2 n^2} \ big）$，即使是$σ= 0 $。也就是说，对于常数$ε$，现有技术需要$ n =ω（d \ sqrt {d}）$提供非平凡的结果。

We study the problem of differentially private linear regression where each data point is sampled from a fixed sub-Gaussian style distribution. We propose and analyze a one-pass mini-batch stochastic gradient descent method (DP-AMBSSGD) where points in each iteration are sampled without replacement. Noise is added for DP but the noise standard deviation is estimated online. Compared to existing $(ε, δ)$-DP techniques which have sub-optimal error bounds, DP-AMBSSGD is able to provide nearly optimal error bounds in terms of key parameters like dimensionality $d$, number of points $N$, and the standard deviation $σ$ of the noise in observations. For example, when the $d$-dimensional covariates are sampled i.i.d. from the normal distribution, then the excess error of DP-AMBSSGD due to privacy is $\frac{σ^2 d}{N}(1+\frac{d}{ε^2 N})$, i.e., the error is meaningful when number of samples $N= Ω(d \log d)$ which is the standard operative regime for linear regression. In contrast, error bounds for existing efficient methods in this setting are: $\mathcal{O}\big(\frac{d^3}{ε^2 N^2}\big)$, even for $σ=0$. That is, for constant $ε$, the existing techniques require $N=Ω(d\sqrt{d})$ to provide a non-trivial result.

下载PDF全文

下载文献需遵守相关版权规定

论文标题