论文标题
惩罚梯度规范以有效改善深度学习的概括
Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning
论文作者
论文摘要
如何训练深层神经网络(DNN)良好的概括是深度学习的核心问题,尤其是对于当今严重的过度参数化网络。在本文中,我们提出了一种有效的方法来通过对优化过程中损失函数的梯度规范进行惩罚,以改善模型概括。我们证明,限制损失功能的梯度规范可以帮助引导优化者找到平坦的最小值。我们利用一阶近似来有效地实现相应的梯度,以适应梯度下降框架。在我们的实验中,我们确认使用我们的方法时,可以在不同的数据集中改善各种模型的概括性能。另外,我们表明,最近的清晰度最小化方法(Foret等,2021)是我们方法的特殊情况,但不是最好的情况,我们方法的最佳情况可以在这些任务上给出新的最新性能。代码可从{https://github.com/zhaoyang-0204/gnp}获得。
How to train deep neural networks (DNNs) to generalize well is a central concern in deep learning, especially for severely overparameterized networks nowadays. In this paper, we propose an effective method to improve the model generalization by additionally penalizing the gradient norm of loss function during optimization. We demonstrate that confining the gradient norm of loss function could help lead the optimizers towards finding flat minima. We leverage the first-order approximation to efficiently implement the corresponding gradient to fit well in the gradient descent framework. In our experiments, we confirm that when using our methods, generalization performance of various models could be improved on different datasets. Also, we show that the recent sharpness-aware minimization method (Foret et al., 2021) is a special, but not the best, case of our method, where the best case of our method could give new state-of-art performance on these tasks. Code is available at {https://github.com/zhaoyang-0204/gnp}.