过度参数化回归的偏置变化分解具有随机线性特征

论文标题

过度参数化回归的偏置变化分解具有随机线性特征

Bias-variance decomposition of overparameterized regression with random linear features

论文作者

Rocks, Jason W., Mehta, Pankaj

论文摘要

在经典统计中，偏差差异权衡描述了模型的复杂性（例如，拟合参数的数量）如何影响其做出准确预测的能力。根据此权衡，当模型表现得足以捕获数据趋势时，可以实现最佳性能，但并不那么复杂，以至于过度构成了培训数据的特质特征。最近，很明显，鉴于“过度参数化模型”的令人难以置信的预测性能，必须从根本上重新审视这种对偏差变化的经典理解，即使拟合参数的数量足够大以完全适合训练数据，这些模型即使避免过度拟合。在这里，我们介绍了过度参数化模型的最简单示例之一：具有随机线性特征的回归（即具有线性激活函数的两层神经网络）。使用零温腔法，我们得出训练误差，测试误差，偏置和方差的分析表达式。我们表明，线性随机特征模型表现出三个相变：向训练误差为零的两种不同的过渡到插值方案，以及具有较大偏见和最小偏见的策略之间的额外过渡。使用随机矩阵理论，我们展示了每个过渡是如何由于Hessian矩阵中的小非零特征值而产生的。最后，我们将随机线性特征模型的相位图与随机非线性特征模型和普通回归进行比较和对比，从而突出了使用线性基函数引起的新相变。

In classical statistics, the bias-variance trade-off describes how varying a model's complexity (e.g., number of fit parameters) affects its ability to make accurate predictions. According to this trade-off, optimal performance is achieved when a model is expressive enough to capture trends in the data, yet not so complex that it overfits idiosyncratic features of the training data. Recently, it has become clear that this classic understanding of the bias-variance must be fundamentally revisited in light of the incredible predictive performance of "overparameterized models" -- models that avoid overfitting even when the number of fit parameters is large enough to perfectly fit the training data. Here, we present results for one of the simplest examples of an overparameterized model: regression with random linear features (i.e. a two-layer neural network with a linear activation function). Using the zero-temperature cavity method, we derive analytic expressions for the training error, test error, bias, and variance. We show that the linear random features model exhibits three phase transitions: two different transitions to an interpolation regime where the training error is zero, along with an additional transition between regimes with large bias and minimal bias. Using random matrix theory, we show how each transition arises due to small nonzero eigenvalues in the Hessian matrix. Finally, we compare and contrast the phase diagram of the random linear features model to the random nonlinear features model and ordinary regression, highlighting the new phase transitions that result from the use of linear basis functions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题