论文标题
级梯度定义的高曲率子空间中的准牛顿的方法
Quasi-Newton's method in the class gradient defined high-curvature subspace
论文作者
论文摘要
使用深度学习的分类问题已被证明在损失景观中具有与类数量相等的损失景观子空间。此外,此子空间对应于每个类的logit梯度跨越的子空间。加快优化加快的一个明显策略是在共同空间中使用牛顿的方法和随机梯度下降。我们表明,幼稚的实现实际上会减慢收敛性,我们推测为什么会这样。
Classification problems using deep learning have been shown to have a high-curvature subspace in the loss landscape equal in dimension to the number of classes. Moreover, this subspace corresponds to the subspace spanned by the logit gradients for each class. An obvious strategy to speed up optimisation would be to use Newton's method in the high-curvature subspace and stochastic gradient descent in the co-space. We show that a naive implementation actually slows down convergence and we speculate why this might be.