论文标题
深线性网络的精确解决方案
Exact Solutions of a Deep Linear Network
论文作者
论文摘要
这项工作发现了具有重量衰减和随机神经元的深线性网络的全球最小值的分析表达,这是理解神经网络景观的基本模型。我们的结果意味着,起源是深度神经网络损失局势中高度非线性现象出现的特殊点。我们表明,权重衰减与模型体系结构有很强的交互,并且可以在一个超过$ 1 $隐藏层的网络中以零的零minima进行交互,这与仅$ 1 $隐藏层的网络质上不同。实际上,我们的结果意味着,常见的深度学习初始化方法不足以简化一般的神经网络的优化。
This work finds the analytical expression of the global minima of a deep linear network with weight decay and stochastic neurons, a fundamental model for understanding the landscape of neural networks. Our result implies that the origin is a special point in deep neural network loss landscape where highly nonlinear phenomenon emerges. We show that weight decay strongly interacts with the model architecture and can create bad minima at zero in a network with more than $1$ hidden layer, qualitatively different from a network with only $1$ hidden layer. Practically, our result implies that common deep learning initialization methods are insufficient to ease the optimization of neural networks in general.