揭示针对符合规范的对抗例子的对抗训练的极限

论文标题

揭示针对符合规范的对抗例子的对抗训练的极限

Uncovering the Limits of Adversarial Training against Norm-Bounded Adversarial Examples

论文作者

Gowal, Sven, Qin, Chongli, Uesato, Jonathan, Mann, Timothy, Kohli, Pushmeet

论文摘要

对抗性训练及其变体已成为学习强大的深神经网络的事实上的标准。在本文中，我们探讨了围绕对抗训练的景观，以发现其极限。我们系统地研究了不同训练损失，模型大小，激活功能，添加未标记的数据（通过伪标记）和其他因素对对抗性鲁棒性的影响。我们发现，可以通过结合较大的模型，Swish/Silu激活和平均模型的重量来训练可靠的模型，这些模型远远超出了最先进的结果。 We demonstrate large improvements on CIFAR-10 and CIFAR-100 against $\ell_\infty$ and $\ell_2$ norm-bounded perturbations of size $8/255$ and $128/255$, respectively.在使用其他未标记数据的设置中，我们在CIFAR-10上的$ 8/255 $ $ 8/255 $的$ \ ell_ \ elfty $扰动（相对于先前的ART方面+6.35％），我们获得了65.88％的准确性。没有其他数据，我们获得了57.20％（+3.46％）的精度。为了测试我们的发现的普遍性，没有任何其他修改，我们可以在CIFAR-10上获得80.53％（+7.62％）的精度，而CIFAR-10 $ 128/255 $ $ 128/255 $，以及36.88％（+8.46％）的精度，而对于$ \ ell_ \ ell_ \ ell_ \ ferty $ \ ferty $ perturbations of 36.88％（+8.46％）。所有模型均可在https://github.com/deepmind/deepmind-research/tree/master/Adversarial_robustness获得。

Adversarial training and its variants have become de facto standards for learning robust deep neural networks. In this paper, we explore the landscape around adversarial training in a bid to uncover its limits. We systematically study the effect of different training losses, model sizes, activation functions, the addition of unlabeled data (through pseudo-labeling) and other factors on adversarial robustness. We discover that it is possible to train robust models that go well beyond state-of-the-art results by combining larger models, Swish/SiLU activations and model weight averaging. We demonstrate large improvements on CIFAR-10 and CIFAR-100 against $\ell_\infty$ and $\ell_2$ norm-bounded perturbations of size $8/255$ and $128/255$, respectively. In the setting with additional unlabeled data, we obtain an accuracy under attack of 65.88% against $\ell_\infty$ perturbations of size $8/255$ on CIFAR-10 (+6.35% with respect to prior art). Without additional data, we obtain an accuracy under attack of 57.20% (+3.46%). To test the generality of our findings and without any additional modifications, we obtain an accuracy under attack of 80.53% (+7.62%) against $\ell_2$ perturbations of size $128/255$ on CIFAR-10, and of 36.88% (+8.46%) against $\ell_\infty$ perturbations of size $8/255$ on CIFAR-100. All models are available at https://github.com/deepmind/deepmind-research/tree/master/adversarial_robustness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题