理解（非）强大的特征分离以及低维和高维攻击之间的关系

论文标题

理解（非）强大的特征分离以及低维和高维攻击之间的关系

Understanding (Non-)Robust Feature Disentanglement and the Relationship Between Low- and High-Dimensional Adversarial Attacks

论文作者

Wang, Zuowen, Horne, Leo

论文摘要

最近的工作提出了一个假设，即神经网络中的对抗脆弱性是由于它们过度使用训练数据中固有的“非稳定功能”。我们从经验上表明，对于PGD攻击，在一个训练阶段，神经网络开始严重依赖非稳定功能以提高自然精度。我们还提出了一种机制，可减少易受PGD式攻击的脆弱性，该攻击是由一定量的图像组成的混合，主要包含每个训练批次中包含“健壮功能”，然后证明稳健的精度得到了提高，而自然精度并未大大受到伤害。我们表明，对“强大功能”的培训可提高各种体系结构和不同攻击的稳健精度。最后，我们从经验上证明了这些“强大特征”不会诱导空间不变性。

Recent work has put forth the hypothesis that adversarial vulnerabilities in neural networks are due to them overusing "non-robust features" inherent in the training data. We show empirically that for PGD-attacks, there is a training stage where neural networks start heavily relying on non-robust features to boost natural accuracy. We also propose a mechanism reducing vulnerability to PGD-style attacks consisting of mixing in a certain amount of images contain-ing mostly "robust features" into each training batch, and then show that robust accuracy is improved, while natural accuracy is not substantially hurt. We show that training on "robust features" provides boosts in robust accuracy across various architectures and for different attacks. Finally, we demonstrate empirically that these "robust features" do not induce spatial invariance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题