论文标题
可靠的线性回归:多项式时间的最佳速率
Robust Linear Regression: Optimal Rates in Polynomial Time
论文作者
论文摘要
我们获得了可靠和计算有效的估计器,用于学习几种在最小的分布假设下实现统计上最佳收敛率的线性模型。具体而言,我们假设我们的数据是从$ k $ - 合同分布中汲取的,而$ε$ - 分数是对手损坏的。然后,我们描述了一个估计器,该估计器会收敛到最佳最小二乘最小化器的真实分布,以与$ε^{2-2/k} $成比例的速率,当噪声独立于协变量时。我们注意到,即使可以访问无限制的计算,我们也没有知道此类估计器。我们实现的速度在理论上是最佳信息,因此我们解决了Klivans,Kothari和Meka [Colt'18]中的主要开放问题。 我们的关键见解是确定一种分析条件,该条件是随机变量独立性的多项式松弛。特别是,我们表明,当噪声和协变量的力矩是负相关时,我们获得了与独立噪声相同的速率。此外,当不满足条件时,我们获得的速率与$ε^{2-4/k} $成比例,然后再次匹配信息理论下限。我们的中心技术贡献是通过将其作为上述多项式不平等的算法来利用“平方之和”框架中随机变量的独立性。
We obtain robust and computationally efficient estimators for learning several linear models that achieve statistically optimal convergence rate under minimal distributional assumptions. Concretely, we assume our data is drawn from a $k$-hypercontractive distribution and an $ε$-fraction is adversarially corrupted. We then describe an estimator that converges to the optimal least-squares minimizer for the true distribution at a rate proportional to $ε^{2-2/k}$, when the noise is independent of the covariates. We note that no such estimator was known prior to our work, even with access to unbounded computation. The rate we achieve is information-theoretically optimal and thus we resolve the main open question in Klivans, Kothari and Meka [COLT'18]. Our key insight is to identify an analytic condition that serves as a polynomial relaxation of independence of random variables. In particular, we show that when the moments of the noise and covariates are negatively-correlated, we obtain the same rate as independent noise. Further, when the condition is not satisfied, we obtain a rate proportional to $ε^{2-4/k}$, and again match the information-theoretic lower bound. Our central technical contribution is to algorithmically exploit independence of random variables in the "sum-of-squares" framework by formulating it as the aforementioned polynomial inequality.