论文标题
(几乎)通过自适应剪辑的最佳私人线性回归
(Nearly) Optimal Private Linear Regression via Adaptive Clipping
论文作者
论文摘要
我们研究了差异私有线性回归的问题,其中每个数据点都是从固定的次高斯样式分布中采样的。我们提出和分析了一种一通小批量的随机梯度下降法(DP-AMBSSGD),其中每次迭代中的点都在没有替换的情况下进行采样。为DP添加了噪声,但噪声标准偏差是在线估计的。与具有亚最佳误差界的现有$(ε,δ)$ -DP技术相比,DP-AMBSSGD能够根据关键参数(例如dimensionalityality $ d $ d $ docts $ n $)以及在观察中噪声的标准偏差$σ$提供几乎最佳的错误界限。例如,当对$ d $二维的协变量进行采样时从正常分布中,然后由于隐私而导致的DP-AMBSSGD的过剩误差为$ \ frac {σ^2 d} {n}(1+ \ frac {d} {d} {ε^2 n})$,即相比之下,此设置中现有有效方法的错误范围为:$ \ MATHCAL {O} \ big(\ frac {d^3} {ε^2 n^2} \ big)$,即使是$σ= 0 $。也就是说,对于常数$ε$,现有技术需要$ n =ω(d \ sqrt {d})$提供非平凡的结果。
We study the problem of differentially private linear regression where each data point is sampled from a fixed sub-Gaussian style distribution. We propose and analyze a one-pass mini-batch stochastic gradient descent method (DP-AMBSSGD) where points in each iteration are sampled without replacement. Noise is added for DP but the noise standard deviation is estimated online. Compared to existing $(ε, δ)$-DP techniques which have sub-optimal error bounds, DP-AMBSSGD is able to provide nearly optimal error bounds in terms of key parameters like dimensionality $d$, number of points $N$, and the standard deviation $σ$ of the noise in observations. For example, when the $d$-dimensional covariates are sampled i.i.d. from the normal distribution, then the excess error of DP-AMBSSGD due to privacy is $\frac{σ^2 d}{N}(1+\frac{d}{ε^2 N})$, i.e., the error is meaningful when number of samples $N= Ω(d \log d)$ which is the standard operative regime for linear regression. In contrast, error bounds for existing efficient methods in this setting are: $\mathcal{O}\big(\frac{d^3}{ε^2 N^2}\big)$, even for $σ=0$. That is, for constant $ε$, the existing techniques require $N=Ω(d\sqrt{d})$ to provide a non-trivial result.