生成具有可控性能的对抗性示例

论文标题

生成具有可控性能的对抗性示例

Generating Adversarial Examples with Controllable Non-transferability

论文作者

Wang, Renzhi, Zhang, Tianwei, Xie, Xiaofei, Ma, Lei, Tian, Cong, Juefei-Xu, Felix, Liu, Yang

论文摘要

针对深神经网络的对抗性攻击已得到广泛研究。使此类攻击特别强大的一个重要特征是可转移性，其中一种模型产生的对抗性示例也可以与其他类似模型有效。已经完成了大量的工作以提高可转移性。但是，尚未探讨如何仅探索特定目标模型的可转移性和恶意样本。在本文中，我们设计了新颖的攻击方法，以生成具有可控性不可转移性的对抗性示例。通过这些方法，对手可以有效地产生精确的对抗示例，以攻击他想要的一组目标模型，同时保持对其他模型的良性。第一种方法是反向损失函数集合，在那里对手可以从反向损耗函数的梯度中制作合格的示例。这种方法对于白色框和灰色框设置有效。第二种方法是可转移性分类：对手从对抗性示例的扰动中训练可转移性 - 感知分类器。该分类器进一步为产生不可转移的对抗示例提供了指导。这种方法可以应用于黑盒方案。评估结果证明了我们提出的方法的有效性和效率。这项工作为生成具有新功能和应用程序的对抗性示例开辟了一条新路线。

Adversarial attacks against Deep Neural Networks have been widely studied. One significant feature that makes such attacks particularly powerful is transferability, where the adversarial examples generated from one model can be effective against other similar models as well. A large number of works have been done to increase the transferability. However, how to decrease the transferability and craft malicious samples only for specific target models are not explored yet. In this paper, we design novel attack methodologies to generate adversarial examples with controllable non-transferability. With these methods, an adversary can efficiently produce precise adversarial examples to attack a set of target models he desires, while keeping benign to other models. The first method is Reversed Loss Function Ensemble, where the adversary can craft qualified examples from the gradients of a reversed loss function. This approach is effective for the white-box and gray-box settings. The second method is Transferability Classification: the adversary trains a transferability-aware classifier from the perturbations of adversarial examples. This classifier further provides the guidance for the generation of non-transferable adversarial examples. This approach can be applied to the black-box scenario. Evaluation results demonstrate the effectiveness and efficiency of our proposed methods. This work opens up a new route for generating adversarial examples with new features and applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题