为什么神经网络起作用

论文标题

为什么神经网络起作用

Why Neural Networks Work

论文作者

Mukherjee, Sayandev, Huberman, Bernardo A.

论文摘要

我们认为，完全连接的前馈神经网络（FCNN）的许多特性（也称为多层感知器（MLP））可以从对单对操作的分析中解释，即比输入更高的维度空间中的随机投影，然后进行谨慎操作。为了方便起见，我们称这对连续的操作在Dasgupta的术语之后扩展了良好。我们展示了如何解释文献中讨论过的观察到的现象，例如所谓的彩票票证假设，令人惊讶的随机定位未经训练的神经网络的良好表现，辍学的效力，培训中的辍学功能以及最重要的是，最重要的是，Zhang et Zhang El的首次亮相模型的神秘通用能力。并随后在Belkin等人的非神经网络模型中确定。

We argue that many properties of fully-connected feedforward neural networks (FCNNs), also called multi-layer perceptrons (MLPs), are explainable from the analysis of a single pair of operations, namely a random projection into a higher-dimensional space than the input, followed by a sparsification operation. For convenience, we call this pair of successive operations expand-and-sparsify following the terminology of Dasgupta. We show how expand-and-sparsify can explain the observed phenomena that have been discussed in the literature, such as the so-called Lottery Ticket Hypothesis, the surprisingly good performance of randomly-initialized untrained neural networks, the efficacy of Dropout in training and most importantly, the mysterious generalization ability of overparameterized models, first highlighted by Zhang et al. and subsequently identified even in non-neural network models by Belkin et al.

下载PDF全文

下载文献需遵守相关版权规定

论文标题