梯度下降的感应偏置重量归一化平滑均质神经网

论文标题

梯度下降的感应偏置重量归一化平滑均质神经网

Inductive Bias of Gradient Descent for Weight Normalized Smooth Homogeneous Neural Nets

论文作者

Morwani, Depen, Ramaswamy, Harish G.

论文摘要

当接受指数或跨凝性损失训练时，我们分析了梯度下降的梯度下降的电感偏置。我们分析了标准重量归一化（SWN）和指数重量归一化（EWN），并表明具有EWN的梯度流动路径等于具有自适应学习率的标准网络上的梯度流。我们将这些结果扩展到梯度下降，并在SWN和EWN之间建立权重和梯度之间的渐近关系。我们还表明，EWN会以一种更喜欢渐近相对稀疏性的方式更新权重。对于EWN，我们提供了有限的时间收敛速率，并以梯度流和渐进式收敛速率随梯度下降。我们在合成数据集中证明了SWN和EWN的结果。简单数据集的实验结果也支持我们对稀疏EWN解决方案的主张，即使使用SGD。这证明了其在学习可修剪的神经网络中的潜在应用。

We analyze the inductive bias of gradient descent for weight normalized smooth homogeneous neural nets, when trained on exponential or cross-entropy loss. We analyse both standard weight normalization (SWN) and exponential weight normalization (EWN), and show that the gradient flow path with EWN is equivalent to gradient flow on standard networks with an adaptive learning rate. We extend these results to gradient descent, and establish asymptotic relations between weights and gradients for both SWN and EWN. We also show that EWN causes weights to be updated in a way that prefers asymptotic relative sparsity. For EWN, we provide a finite-time convergence rate of the loss with gradient flow and a tight asymptotic convergence rate with gradient descent. We demonstrate our results for SWN and EWN on synthetic data sets. Experimental results on simple datasets support our claim on sparse EWN solutions, even with SGD. This demonstrates its potential applications in learning neural networks amenable to pruning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题