论文标题
双重下降的双重麻烦:懒惰制度中的偏见和差异
Double Trouble in Double Descent : Bias and Variance(s) in the Lazy Regime
论文作者
论文摘要
深度神经网络可以在完美地插值训练数据的同时实现出色的概括性能。他们的测试错误通常不是偏向差异权衡的U-Curve象征,而是遵循“双重下降”的标志,这是过度参数化的有益作用的标志。在这项工作中,我们通过考虑学习具有随机特征回归的高维函数的问题,为神经网络的所谓懒惰学习制度开发了这种现象的定量理论。我们获得了测试误差的偏置方差分解的精确渐近表达,并表明偏置在插值阈值下显示相变,超过其恒定。我们从数据集的采样,添加噪声损坏标签以及权重的初始化中解散了源于数据集的采样差异。跟进Geiger等。 2019年,我们首先表明后两个贡献是双重下降的关键:它们导致插值阈值处的过度拟合峰值和过度参数后测试误差的衰减。然后,我们通过集合平均K独立初始化估计器的输出来量化它们的抑制作用。当k发送到无穷大时,测试误差在插值阈值之外保持恒定。我们进一步比较了过度兼容,结合和正规化的影响。最后,我们介绍了经典深度学习设置的数值实验,以表明我们的结果在逼真的懒惰学习场景中依赖。
Deep neural networks can achieve remarkable generalization performances while interpolating the training data perfectly. Rather than the U-curve emblematic of the bias-variance trade-off, their test error often follows a "double descent" - a mark of the beneficial role of overparametrization. In this work, we develop a quantitative theory for this phenomenon in the so-called lazy learning regime of neural networks, by considering the problem of learning a high-dimensional function with random features regression. We obtain a precise asymptotic expression for the bias-variance decomposition of the test error, and show that the bias displays a phase transition at the interpolation threshold, beyond which it remains constant. We disentangle the variances stemming from the sampling of the dataset, from the additive noise corrupting the labels, and from the initialization of the weights. Following up on Geiger et al. 2019, we first show that the latter two contributions are the crux of the double descent: they lead to the overfitting peak at the interpolation threshold and to the decay of the test error upon overparametrization. We then quantify how they are suppressed by ensemble averaging the outputs of K independently initialized estimators. When K is sent to infinity, the test error remains constant beyond the interpolation threshold. We further compare the effects of overparametrizing, ensembling and regularizing. Finally, we present numerical experiments on classic deep learning setups to show that our results hold qualitatively in realistic lazy learning scenarios.