多重扰动攻击：在不同的$ \ ell_p $ -norms下攻击pixelwise，以提高对抗性的性能

论文标题

多重扰动攻击：在不同的$ \ ell_p $ -norms下攻击pixelwise，以提高对抗性的性能

Multiple Perturbation Attack: Attack Pixelwise Under Different $\ell_p$-norms For Better Adversarial Performance

论文作者

Tran, Ngoc N., Bui, Anh Tuan, Phung, Dinh, Le, Trung

论文摘要

对抗性机器学习既是一个主要问题，也是一个热门话题，尤其是在当前景观中无处不在使用深神经网络。对抗性攻击和防御通常被比作猫赶出的游戏，在该游戏中，后卫和攻击者随着时间的流逝而发展。一方面，目标是建立强大而强大的深层网络，这些网络对恶意演员有抵抗力。另一方面，为了实现这一目标，我们需要设计更强大的对抗性攻击，以挑战这些防御模型。大多数现有攻击都采用单个$ \ ell_p $ dance（通常，$ p \ in \ {1,2，\ infty \} $）来定义紧密的概念并执行最陡峭的梯度上升W.R.T.这个$ p $ - 以相同方式以对抗性示例更新所有像素。这些$ \ ell_p $攻击每个都有自己的利弊；而且，没有一个攻击能够成功地突破与多个$ \ ell_p $ norms同时强大的防御模型。在这些观察结果的推动下，我们提出了一种自然的方法：在像素级别上结合各种$ \ ell_p $梯度预测以实现联合对抗性扰动。具体而言，我们学习如何扰动每个像素以最大化攻击性能，同时保持对抗性示例的整体视觉不可识别。最后，通过具有标准化基准测试的各种实验，我们表明我们的方法在最新的防御机制中优于当前的大多数强烈攻击，同时保持其视觉上保持清洁的能力。

Adversarial machine learning has been both a major concern and a hot topic recently, especially with the ubiquitous use of deep neural networks in the current landscape. Adversarial attacks and defenses are usually likened to a cat-and-mouse game in which defenders and attackers evolve over the time. On one hand, the goal is to develop strong and robust deep networks that are resistant to malicious actors. On the other hand, in order to achieve that, we need to devise even stronger adversarial attacks to challenge these defense models. Most of existing attacks employs a single $\ell_p$ distance (commonly, $p\in\{1,2,\infty\}$) to define the concept of closeness and performs steepest gradient ascent w.r.t. this $p$-norm to update all pixels in an adversarial example in the same way. These $\ell_p$ attacks each has its own pros and cons; and there is no single attack that can successfully break through defense models that are robust against multiple $\ell_p$ norms simultaneously. Motivated by these observations, we come up with a natural approach: combining various $\ell_p$ gradient projections on a pixel level to achieve a joint adversarial perturbation. Specifically, we learn how to perturb each pixel to maximize the attack performance, while maintaining the overall visual imperceptibility of adversarial examples. Finally, through various experiments with standardized benchmarks, we show that our method outperforms most current strong attacks across state-of-the-art defense mechanisms, while retaining its ability to remain clean visually.

下载PDF全文

下载文献需遵守相关版权规定

论文标题