重新思考偏见变化的权衡取决于神经网络的概括

论文标题

重新思考偏见变化的权衡取决于神经网络的概括

Rethinking Bias-Variance Trade-off for Generalization of Neural Networks

论文作者

Yang, Zitong, Yu, Yaodong, You, Chong, Steinhardt, Jacob, Ma, Yi

论文摘要

经典的偏见变化权衡预测，偏见会降低，而随着模型复杂性的变化增加，导致U形风险曲线。最近的工作将其引起了神经网络和其他过度参数化模型的质疑，通常观察到较大的模型可以更好地推广。我们通过测量神经网络的偏差和方差来为此提供一个简单的解释：虽然偏见在单调上降低了，但在经典理论中，偏差是单型或钟形的，但它是单峰或钟形的：它会增加，然后随着网络的宽度而降低。我们改变了网络体系结构，损耗函数和数据集的选择，并确认对于我们考虑的所有模型，差异不可测量均强烈地发生。风险曲线是偏差和方差曲线的总和，根据偏差和方差的相对规模显示不同的定性形状，而在最近的文献中观察到双重下降曲线是一种特殊情况。我们通过对具有随机第一层的两层线性网络的理论分析来证实这些经验结果。最后，对分布数据的评估表明，准确性的大部分下降来自偏见的增加，而差异增加了较小的偏差。此外，我们发现更深层次的模型会降低偏差并增加分布和分布数据的差异。

The classical bias-variance trade-off predicts that bias decreases and variance increase with model complexity, leading to a U-shaped risk curve. Recent work calls this into question for neural networks and other over-parameterized models, for which it is often observed that larger models generalize better. We provide a simple explanation for this by measuring the bias and variance of neural networks: while the bias is monotonically decreasing as in the classical theory, the variance is unimodal or bell-shaped: it increases then decreases with the width of the network. We vary the network architecture, loss function, and choice of dataset and confirm that variance unimodality occurs robustly for all models we considered. The risk curve is the sum of the bias and variance curves and displays different qualitative shapes depending on the relative scale of bias and variance, with the double descent curve observed in recent literature as a special case. We corroborate these empirical results with a theoretical analysis of two-layer linear networks with random first layer. Finally, evaluation on out-of-distribution data shows that most of the drop in accuracy comes from increased bias while variance increases by a relatively small amount. Moreover, we find that deeper models decrease bias and increase variance for both in-distribution and out-of-distribution data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题