频率驱动的不可察觉的对抗性攻击语义相似性

论文标题

频率驱动的不可察觉的对抗性攻击语义相似性

Frequency-driven Imperceptible Adversarial Attack on Semantic Similarity

论文作者

Luo, Cheng, Lin, Qinliang, Xie, Weicheng, Wu, Bizhu, Xie, Jinheng, Shen, Linlin

论文摘要

当前的对抗攻击研究揭示了基于学习的分类器对精心制作的扰动的脆弱性。但是，大多数现有的攻击方法在跨数据集概括中具有固有的局限性，因为它们依赖于具有封闭类别的分类层。此外，这些方法产生的扰动可能会出现在人类视觉系统（HVS）容易察觉的区域中。为了解决以前的问题，我们提出了一种新颖的算法，该算法在特征表示上攻击语义相似性。通过这种方式，我们能够欺骗分类器，而无需将攻击限制在特定数据集中。为了实质性，我们引入了低频限制，以限制高频组件内的扰动，从而确保对抗性示例和原件之间的感知相似性。在三个数据集（CIFAR-10，CIFAR-100和IMAGENET-1K）和三个公共在线平台上进行了广泛的实验表明，我们的攻击可以产生跨架构和数据集的误导性和可转移的对抗性示例。此外，可视化结果和定量性能（在四个不同的指标方面）表明，所提出的算法比最先进的方法会产生更不可感知的扰动。代码可用。

Current adversarial attack research reveals the vulnerability of learning-based classifiers against carefully crafted perturbations. However, most existing attack methods have inherent limitations in cross-dataset generalization as they rely on a classification layer with a closed set of categories. Furthermore, the perturbations generated by these methods may appear in regions easily perceptible to the human visual system (HVS). To circumvent the former problem, we propose a novel algorithm that attacks semantic similarity on feature representations. In this way, we are able to fool classifiers without limiting attacks to a specific dataset. For imperceptibility, we introduce the low-frequency constraint to limit perturbations within high-frequency components, ensuring perceptual similarity between adversarial examples and originals. Extensive experiments on three datasets (CIFAR-10, CIFAR-100, and ImageNet-1K) and three public online platforms indicate that our attack can yield misleading and transferable adversarial examples across architectures and datasets. Additionally, visualization results and quantitative performance (in terms of four different metrics) show that the proposed algorithm generates more imperceptible perturbations than the state-of-the-art methods. Code is made available at.

下载PDF全文

下载文献需遵守相关版权规定

论文标题