改进的肤浅神经网络的收敛保证

论文标题

改进的肤浅神经网络的收敛保证

Improved Convergence Guarantees for Shallow Neural Networks

论文作者

Razborov, Alexander

论文摘要

我们继续进行一系列研究，旨在证明通过梯度下降训练的深度2神经网络的收敛至全球最低。像以前的许多作品一样，我们的模型具有以下功能：具有二次损耗函数的回归，完全连接的前馈体系结构，relu激活，高斯数据实例和网络初始化，对抗性标签。从某种意义上说，我们允许同时训练这两个层，并以{\ em不同}的速率进行培训。我们的结果改善了最先进的[Oymak Soltanolkotabi 20]（仅训练第一层）和[Nguyen 21，第3.2节]（训练了Le Cun初始化的两层）。我们还报告了几个具有合成数据的简单实验。他们强烈建议，至少在我们的模型中，融合现象远远超出了``NTK政权''。

We continue a long line of research aimed at proving convergence of depth 2 neural networks, trained via gradient descent, to a global minimum. Like in many previous works, our model has the following features: regression with quadratic loss function, fully connected feedforward architecture, RelU activations, Gaussian data instances and network initialization, adversarial labels. It is more general in the sense that we allow both layers to be trained simultaneously and at {\em different} rates. Our results improve on state-of-the-art [Oymak Soltanolkotabi 20] (training the first layer only) and [Nguyen 21, Section 3.2] (training both layers with Le Cun's initialization). We also report several simple experiments with synthetic data. They strongly suggest that, at least in our model, the convergence phenomenon extends well beyond the ``NTK regime''.

下载PDF全文

下载文献需遵守相关版权规定

论文标题