论文标题
良性过度拟合无线性:通过梯度下降训练嘈杂线性数据的神经网络分类器
Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data
论文作者
论文摘要
良性过度拟合,即在存在嘈杂数据的情况下插值模型良好概括的现象,首先是在接受梯度下降训练的神经网络模型中观察到的。为了更好地理解这种经验观察,我们考虑了两层神经网络的概括误差,这些神经网络在随机初始化后通过梯度下降而训练了通过梯度下降来插值。我们假设数据来自分离良好的集体条件对数符合分布,并允许训练标签的持续部分被对手损坏。我们表明,在这种情况下,神经网络表现出良性过度拟合:可以将它们驱动到零训练错误,完美拟合所有嘈杂的训练标签,并同时达到最小值最佳测试错误。与以前需要线性或基于内核预测的良性过度拟合的工作相反,我们的分析在模型和学习动力学基本上是非线性的环境中。
Benign overfitting, the phenomenon where interpolating models generalize well in the presence of noisy data, was first observed in neural network models trained with gradient descent. To better understand this empirical observation, we consider the generalization error of two-layer neural networks trained to interpolation by gradient descent on the logistic loss following random initialization. We assume the data comes from well-separated class-conditional log-concave distributions and allow for a constant fraction of the training labels to be corrupted by an adversary. We show that in this setting, neural networks exhibit benign overfitting: they can be driven to zero training error, perfectly fitting any noisy training labels, and simultaneously achieve minimax optimal test error. In contrast to previous work on benign overfitting that require linear or kernel-based predictors, our analysis holds in a setting where both the model and learning dynamics are fundamentally nonlinear.