论文标题
山脊回归的确定性流素描
A Deterministic Streaming Sketch for Ridge Regression
论文作者
论文摘要
我们提供了一种确定性的空间效率算法,用于估计脊回归。对于具有$ D $功能和足够大的正则化参数的$ n $数据点,我们在$ \ varepsilon $ l $ _2 $中提供了一个解决方案,仅使用$ o(d/\ varepsilon)$ space。这是第一个$ O(d^2)$空间确定性流算法,保证解决方案错误和为此经典问题绑定的风险。该算法按频繁方向的变体绘制协方差矩阵,这意味着它可以在仅插入流和各种分布式数据设置中运行。与在合成和现实世界数据集上的随机素描算法相比,我们的算法使用较小的空间和相似的时间的经验误差较少。
We provide a deterministic space-efficient algorithm for estimating ridge regression. For $n$ data points with $d$ features and a large enough regularization parameter, we provide a solution within $\varepsilon$ L$_2$ error using only $O(d/\varepsilon)$ space. This is the first $o(d^2)$ space deterministic streaming algorithm with guaranteed solution error and risk bound for this classic problem. The algorithm sketches the covariance matrix by variants of Frequent Directions, which implies it can operate in insertion-only streams and a variety of distributed data settings. In comparisons to randomized sketching algorithms on synthetic and real-world datasets, our algorithm has less empirical error using less space and similar time.