MSE梯度优化在缺乏参数化神经网络中的隐性偏差

论文标题

MSE梯度优化在缺乏参数化神经网络中的隐性偏差

Implicit Bias of MSE Gradient Optimization in Underparameterized Neural Networks

论文作者

Bowman, Benjamin, Montufar, Guido

论文摘要

当通过梯度流优化平均误差时，我们研究功能空间中神经网络的动力学。我们表明，在缺乏参数化的状态中，网络了解了由神经切线内核（NTK）确定的积分运算符$ t_ {k^\ infty} $的特征函数，其速率与其特征值相对应。例如，对于在球体上均匀分布的数据$ s^{d -1} $和旋转不变的权重分布，$ t_ {k^\ infty} $的特征函数是球形谐波。我们的结果可以理解为描述了参数型状态中的频谱偏差。证明使用“阻尼偏差”的概念，在该偏差中，由于引起阻尼因子的出现，NTK对具有较大特征值的特征方向的偏差较小。除了参数范围的政权外，可以使用阻尼偏差的观点来跟踪过度参数化的环境中经验风险的动态，从而使我们能够在文献中扩展某些结果。我们得出的结论是，在优化平方误差时，阻尼偏差为动态提供了简单而统一的视角。

We study the dynamics of a neural network in function space when optimizing the mean squared error via gradient flow. We show that in the underparameterized regime the network learns eigenfunctions of an integral operator $T_{K^\infty}$ determined by the Neural Tangent Kernel (NTK) at rates corresponding to their eigenvalues. For example, for uniformly distributed data on the sphere $S^{d - 1}$ and rotation invariant weight distributions, the eigenfunctions of $T_{K^\infty}$ are the spherical harmonics. Our results can be understood as describing a spectral bias in the underparameterized regime. The proofs use the concept of "Damped Deviations", where deviations of the NTK matter less for eigendirections with large eigenvalues due to the occurence of a damping factor. Aside from the underparameterized regime, the damped deviations point-of-view can be used to track the dynamics of the empirical risk in the overparameterized setting, allowing us to extend certain results in the literature. We conclude that damped deviations offers a simple and unifying perspective of the dynamics when optimizing the squared error.

下载PDF全文

下载文献需遵守相关版权规定

论文标题