论文标题
分布式学习下线性回归的概括错误
Generalization Error for Linear Regression under Distributed Learning
论文作者
论文摘要
分布式学习通过在几个节点上分配计算负担来促进数据处理的扩展。尽管对分布式学习产生了极大的兴趣,但这种方法的概括性能尚未得到很好的理解。我们通过关注线性回归设置来解决这一差距。我们考虑未知数在节点网络上分布的设置。我们提出了对概括误差对未知数对节点分配的依赖性的分析表征。特别是,对于过度参数化的情况,我们的结果表明,虽然训练数据的误差保持与集中式解决方案的范围相同,但分布式溶液的概括误差与集中式解决方案相比,当在任何节点估计的任何节点的未知数数量接近观测值时,都会显着增加。我们进一步提供数值示例来验证我们的分析表达式。
Distributed learning facilitates the scaling-up of data processing by distributing the computational burden over several nodes. Despite the vast interest in distributed learning, generalization performance of such approaches is not well understood. We address this gap by focusing on a linear regression setting. We consider the setting where the unknowns are distributed over a network of nodes. We present an analytical characterization of the dependence of the generalization error on the partitioning of the unknowns over nodes. In particular, for the overparameterized case, our results show that while the error on training data remains in the same range as that of the centralized solution, the generalization error of the distributed solution increases dramatically compared to that of the centralized solution when the number of unknowns estimated at any node is close to the number of observations. We further provide numerical examples to verify our analytical expressions.