训练线性神经网络：非本地收敛和复杂性结果

论文标题

训练线性神经网络：非本地收敛和复杂性结果

Training Linear Neural Networks: Non-Local Convergence and Complexity Results

论文作者

Eftekhari, Armin

论文摘要

线性网络总体上为神经网络的运作提供了宝贵的见解。本文确定了尽管优化景观中存在的非碎片鞍点，但梯度流可证明训练线性网络的条件。本文还提供了具有梯度流的训练线性网络的计算复杂性。为了实现这些结果，这项工作开发了一种机械，以确定稳定的梯度流，这使我们能够改善线性网络文献中最新的状态（Bah等，2019； Arora等，2018a）。至关重要的是，我们的结果似乎是第一个摆脱统治神经网络文献的懒惰训练制度的结果。这项工作要求该网络具有一个带有一个神经元的图层，该神经元将网络带有标量输出，但是将这项理论工作的结果扩展到所有线性网络仍然是一个具有挑战性的开放问题。

Linear networks provide valuable insights into the workings of neural networks in general. This paper identifies conditions under which the gradient flow provably trains a linear network, in spite of the non-strict saddle points present in the optimization landscape. This paper also provides the computational complexity of training linear networks with gradient flow. To achieve these results, this work develops a machinery to provably identify the stable set of gradient flow, which then enables us to improve over the state of the art in the literature of linear networks (Bah et al., 2019;Arora et al., 2018a). Crucially, our results appear to be the first to break away from the lazy training regime which has dominated the literature of neural networks. This work requires the network to have a layer with one neuron, which subsumes the networks with a scalar output, but extending the results of this theoretical work to all linear networks remains a challenging open problem.

下载PDF全文

下载文献需遵守相关版权规定

论文标题