有限宽度神经网络中双血统的现象学

论文标题

有限宽度神经网络中双血统的现象学

Phenomenology of Double Descent in Finite-Width Neural Networks

论文作者

Singh, Sidak Pal, Lucchi, Aurelien, Hofmann, Thomas, Schölkopf, Bernhard

论文摘要

“双重下降”根据其属于的状态来描述模型的概括行为：不足或过度参数化。这种现象的发生背后的当前理论理解主要基于线性和内核回归模型 - 通过神经切线核与神经网络的非正式相似之处。因此，此类分析不能充分捕获有限宽度神经网络中的双重下降背后的机制，以及无视关键组件（例如选择损耗函数的选择）。我们通过利用影响功能来解决这些缺点，以得出人口损失及其下限的适当表达，同时对参数模型的形式施加最小的假设。我们派生的边界与最佳的黑森的光谱有着密切的联系，重要的是，在插值阈值下表现出双重下降行为。在我们的分析的基础上，我们进一步研究了损失函数如何影响双重下降，从而发现了神经网络及其在插值阈值附近的HESSIAN光谱的有趣特性。

`Double descent' delineates the generalization behaviour of models depending on the regime they belong to: under- or over-parameterized. The current theoretical understanding behind the occurrence of this phenomenon is primarily based on linear and kernel regression models -- with informal parallels to neural networks via the Neural Tangent Kernel. Therefore such analyses do not adequately capture the mechanisms behind double descent in finite-width neural networks, as well as, disregard crucial components -- such as the choice of the loss function. We address these shortcomings by leveraging influence functions in order to derive suitable expressions of the population loss and its lower bound, while imposing minimal assumptions on the form of the parametric model. Our derived bounds bear an intimate connection with the spectrum of the Hessian at the optimum, and importantly, exhibit a double descent behaviour at the interpolation threshold. Building on our analysis, we further investigate how the loss function affects double descent -- and thus uncover interesting properties of neural networks and their Hessian spectra near the interpolation threshold.

下载PDF全文

下载文献需遵守相关版权规定

论文标题