论文标题
与几乎正交数据的过度参数的随机特征回归
Overparameterized random feature regression with nearly orthogonal data
论文作者
论文摘要
我们研究了带有随机高斯初始化的两层神经网络给出的随机特征脊回归(RFRR)的特性。我们研究了RFRR的非反应行为,其在过度参数化方案中几乎是正交确定性单位长度数据向量,其中第一层的宽度远大于样本量。我们的分析显示了训练误差,跨验证和RFRR的概括误差的高概率的非轴突浓度结果,该误差围绕其各自值的核脊回归(KRR)。该KRR源自非线性随机特征图生成的预期内核。然后,我们通过从激活函数的Hermite多项式扩展获得的多项式核基质来近似KRR的性能,该基质的程度仅取决于不同数据点之间的正交性。该多项式内核决定了RFRR和KRR的渐近行为。我们的结果适用于各种表现出几乎正交特性的激活函数和输入数据集。基于这些近似值,我们为非线性学生教师模型获得了RFRR的概括误差的下限。
We investigate the properties of random feature ridge regression (RFRR) given by a two-layer neural network with random Gaussian initialization. We study the non-asymptotic behaviors of the RFRR with nearly orthogonal deterministic unit-length input data vectors in the overparameterized regime, where the width of the first layer is much larger than the sample size. Our analysis shows high-probability non-asymptotic concentration results for the training errors, cross-validations, and generalization errors of RFRR centered around their respective values for a kernel ridge regression (KRR). This KRR is derived from an expected kernel generated by a nonlinear random feature map. We then approximate the performance of the KRR by a polynomial kernel matrix obtained from the Hermite polynomial expansion of the activation function, whose degree only depends on the orthogonality among different data points. This polynomial kernel determines the asymptotic behavior of the RFRR and the KRR. Our results hold for a wide variety of activation functions and input data sets that exhibit nearly orthogonal properties. Based on these approximations, we obtain a lower bound for the generalization error of the RFRR for a nonlinear student-teacher model.