由随机重量扰动指导的有效概括改进

论文标题

由随机重量扰动指导的有效概括改进

Efficient Generalization Improvement Guided by Random Weight Perturbation

论文作者

Li, Tao, Yan, Weihao, Lei, Zehao, Wu, Yingwen, Fang, Kun, Yang, Ming, Huang, Xiaolin

论文摘要

为了完全揭示深神经网络（DNN）的巨大潜力，已经开发了各种学习算法来提高模型的概括能力。最近，清晰度感知的最小化（SAM）建立了一种通用方案，可通过最大程度地减少小社区内的清晰度度量并实现最先进的性能，从而建立了一种通用方案。但是，SAM需要进行两次连续的梯度评估，以解决Min-Max问题，并不可避免地将训练时间翻了一番。在本文中，我们诉诸于过滤随机重量扰动（RWP），以使SAM中的嵌套梯度解除。与SAM中的小对抗扰动不同，RWP柔和，允许更大的扰动。具体而言，我们通过随机扰动和原始损失函数共同优化损失函数：前者将网络引导到更宽的平坦区域，而后者有助于恢复必要的本地信息。这两个损失项相互互补，并相互独立。因此，相应的梯度可以并联有效地计算，从而实现了与常规训练的训练速度几乎相同的训练速度。结果，与SAM相比，我们在CIFAR上取得了非常具竞争力的性能，并且在ImageNet上的性能非常好（例如$ \ Mathbf {+1.1 \％} $），但始终需要一半的培训时间。该代码在https://github.com/nblt/rwp上发布。

To fully uncover the great potential of deep neural networks (DNNs), various learning algorithms have been developed to improve the model's generalization ability. Recently, sharpness-aware minimization (SAM) establishes a generic scheme for generalization improvements by minimizing the sharpness measure within a small neighborhood and achieves state-of-the-art performance. However, SAM requires two consecutive gradient evaluations for solving the min-max problem and inevitably doubles the training time. In this paper, we resort to filter-wise random weight perturbations (RWP) to decouple the nested gradients in SAM. Different from the small adversarial perturbations in SAM, RWP is softer and allows a much larger magnitude of perturbations. Specifically, we jointly optimize the loss function with random perturbations and the original loss function: the former guides the network towards a wider flat region while the latter helps recover the necessary local information. These two loss terms are complementary to each other and mutually independent. Hence, the corresponding gradients can be efficiently computed in parallel, enabling nearly the same training speed as regular training. As a result, we achieve very competitive performance on CIFAR and remarkably better performance on ImageNet (e.g. $\mathbf{ +1.1\%}$) compared with SAM, but always require half of the training time. The code is released at https://github.com/nblt/RWP.

下载PDF全文

下载文献需遵守相关版权规定

论文标题