接受SGD培训的神经网络学习复杂性增加的分布

论文标题

接受SGD培训的神经网络学习复杂性增加的分布

Neural networks trained with SGD learn distributions of increasing complexity

论文作者

Refinetti, Maria, Ingrosso, Alessandro, Goldt, Sebastian

论文摘要

深层神经网络即使插入训练数据的能力也可以通过各种“简单性偏见”来解释。这些理论假设，在学习更复杂的非线性函数之前，神经网络首先学习简单的功能（例如线性分类器）避免过度拟合。同时，数据结构也被认为是良好概括的关键要素，但尚未了解其在简单性偏见中的作用。在这里，我们表明，使用随机梯度下降训练的神经网络最初使用低阶输入统计量（如平均值和协方差）对其输入进行了分类，并且仅在训练期间才能利用高阶统计。我们首先在训练合成数据的神经网络的可解决模型中证明了这种分布简单性偏差（DSB）。我们从经验上证明了DSB在一系列深卷积网络和经过CIFAR10训练的视觉变压器中，并表明它甚至在ImageNet预先训练的网络中。我们讨论了DSB与其他简单性偏见的关系，并考虑了其对高斯学习原则的影响。

The ability of deep neural networks to generalise well even when they interpolate their training data has been explained using various "simplicity biases". These theories postulate that neural networks avoid overfitting by first learning simple functions, say a linear classifier, before learning more complex, non-linear functions. Meanwhile, data structure is also recognised as a key ingredient for good generalisation, yet its role in simplicity biases is not yet understood. Here, we show that neural networks trained using stochastic gradient descent initially classify their inputs using lower-order input statistics, like mean and covariance, and exploit higher-order statistics only later during training. We first demonstrate this distributional simplicity bias (DSB) in a solvable model of a neural network trained on synthetic data. We empirically demonstrate DSB in a range of deep convolutional networks and visual transformers trained on CIFAR10, and show that it even holds in networks pre-trained on ImageNet. We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of Gaussian universality in learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题