在神经网络训练中的全局收敛性和几何表征，用于线性不可分离的数据分类

论文标题

在神经网络训练中的全局收敛性和几何表征，用于线性不可分离的数据分类

Global Convergence and Geometric Characterization of Slow to Fast Weight Evolution in Neural Network Training for Classifying Linearly Non-Separable Data

论文作者

Long, Ziang, Yin, Penghang, Xin, Jack

论文摘要

在本文中，我们研究了学习分类问题的神经网络中梯度下降的动态。与现有作品不同，我们考虑了在正交子空间中不同类别的培训数据线性不可分割的情况。我们表明，当网络具有足够（但不大）数量的神经元时，（1）相应的最小化问题具有理想的景观，其中所有关键点都是全球最小值，具有完美的分类；（2）保证梯度下降会融合到全球最小值。此外，我们在网络重量上发现了几何条件，因此当满足时，重量演化从重量方向缓慢的阶段传播到重量收敛的快速阶段。几何条件显示，在单位球上投射的权重的凸壳包含原点。

In this paper, we study the dynamics of gradient descent in learning neural networks for classification problems. Unlike in existing works, we consider the linearly non-separable case where the training data of different classes lie in orthogonal subspaces. We show that when the network has sufficient (but not exceedingly large) number of neurons, (1) the corresponding minimization problem has a desirable landscape where all critical points are global minima with perfect classification; (2) gradient descent is guaranteed to converge to the global minima. Moreover, we discovered a geometric condition on the network weights so that when it is satisfied, the weight evolution transitions from a slow phase of weight direction spreading to a fast phase of weight convergence. The geometric condition says that the convex hull of the weights projected on the unit sphere contains the origin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题