使用极少数测试示例对任何对抗攻击的普遍化

论文标题

使用极少数测试示例对任何对抗攻击的普遍化

Universalization of any adversarial attack using very few test examples

论文作者

Kamath, Sandesh, Deshpande, Amit, Subrahmanyam, K V, Balasubramanian, Vineeth N

论文摘要

深度学习模型已知不仅容易受到输入依赖性的对抗攻击，而且对输入不合时宜或通用的对抗性攻击也很容易受到攻击。 Dezfooli等。 \ cite {dezfooli17，dezfooli17anal}通过查看大量训练数据点和附近的决策边界的几何形状来构建对给定模型的通用对抗攻击。随后的工作\ Cite {Khrulkov18}仅查看给定模型的测试示例和中间层来构建通用攻击。在本文中，我们提出了一种简单的普遍化技术，以进行任何依赖于输入的对抗性攻击，并仅查看很少的对抗测试示例来构建通用攻击。我们不需要给定模型的详细信息，并且具有可忽略的计算开销以进行普遍化。从理论上讲，我们通过许多依赖输入依赖性对抗扰动的光谱特性，例如梯度，快速梯度符号方法（FGSM）和DeepFool合理。使用矩阵浓度的不等式和光谱扰动界，我们表明，小型测试样本上输入依赖性对抗方向的最高奇异向量可产生有效而简单的通用对抗性攻击。对于在Imagenet上训练的VGG16和VGG19模型，我们对梯度，FGSM和DeepFool扰动的简单普遍化使用64张图像的测试样本可提供与最先进的通用攻击\ Cite {dezfooli17，Khrulkov18}的愚蠢率相当，出于合理的扰动规范。代码可在https://github.com/ksandeshk/svd-uap上找到。

Deep learning models are known to be vulnerable not only to input-dependent adversarial attacks but also to input-agnostic or universal adversarial attacks. Dezfooli et al. \cite{Dezfooli17,Dezfooli17anal} construct universal adversarial attack on a given model by looking at a large number of training data points and the geometry of the decision boundary near them. Subsequent work \cite{Khrulkov18} constructs universal attack by looking only at test examples and intermediate layers of the given model. In this paper, we propose a simple universalization technique to take any input-dependent adversarial attack and construct a universal attack by only looking at very few adversarial test examples. We do not require details of the given model and have negligible computational overhead for universalization. We theoretically justify our universalization technique by a spectral property common to many input-dependent adversarial perturbations, e.g., gradients, Fast Gradient Sign Method (FGSM) and DeepFool. Using matrix concentration inequalities and spectral perturbation bounds, we show that the top singular vector of input-dependent adversarial directions on a small test sample gives an effective and simple universal adversarial attack. For VGG16 and VGG19 models trained on ImageNet, our simple universalization of Gradient, FGSM, and DeepFool perturbations using a test sample of 64 images gives fooling rates comparable to state-of-the-art universal attacks \cite{Dezfooli17,Khrulkov18} for reasonable norms of perturbation. Code available at https://github.com/ksandeshk/svd-uap .

下载PDF全文

下载文献需遵守相关版权规定

论文标题