论文标题
$ l_ {2} $中的功能学习
Feature Learning in $L_{2}$-regularized DNNs: Attraction/Repulsion and Sparsity
论文作者
论文摘要
我们使用$ L_ {2} $正则化研究DNN的损失表面。我们表明,可以根据训练集的$ z _ {\ ell} $将参数的损失重新归为损失。该重新印象揭示了功能学习背后的动态:每个隐藏表示形式$ z _ {\ ell} $都是最佳的W.R.T.要吸引/排斥问题,并在输入表示和输出表示之间进行插值,请尽可能地从输入中保留最少的信息,以构建下一层的激活。对于积极的均质非线性,可以根据隐藏表示的协方差进一步重新重新重新重新重新重新重新重新恢复,这是对凸锥的部分凸优化的形式。 第二次重新制定使我们能够证明均质DNNS的稀疏性结果:$ L_ {2} $的任何局部最低限度可以通过每个隐藏层中最多$ n(n+1)$神经元(其中$ n $是培训集的大小)实现。我们通过给出一个需要$ n^{2}/4 $隐藏神经元的局部最低限度的示例来表明这种界限很紧。但是,我们还观察到,在更传统的设置中,需要少于$ n^{2} $神经元才能达到最小值。
We study the loss surface of DNNs with $L_{2}$ regularization. We show that the loss in terms of the parameters can be reformulated into a loss in terms of the layerwise activations $Z_{\ell}$ of the training set. This reformulation reveals the dynamics behind feature learning: each hidden representations $Z_{\ell}$ are optimal w.r.t. to an attraction/repulsion problem and interpolate between the input and output representations, keeping as little information from the input as necessary to construct the activation of the next layer. For positively homogeneous non-linearities, the loss can be further reformulated in terms of the covariances of the hidden representations, which takes the form of a partially convex optimization over a convex cone. This second reformulation allows us to prove a sparsity result for homogeneous DNNs: any local minimum of the $L_{2}$-regularized loss can be achieved with at most $N(N+1)$ neurons in each hidden layer (where $N$ is the size of the training set). We show that this bound is tight by giving an example of a local minimum that requires $N^{2}/4$ hidden neurons. But we also observe numerically that in more traditional settings much less than $N^{2}$ neurons are required to reach the minima.