使用Denoising AutoCododers进行对抗纯化

论文标题

使用Denoising AutoCododers进行对抗纯化

Towards Adversarial Purification using Denoising AutoEncoders

论文作者

Kalaria, Dvij, Hazra, Aritra, Chakrabarti, Partha Pratim

论文摘要

随着图像识别中深度学习模型的快速发展和使用的增加，安全成为其在安全至关重要系统中的部署的主要关注点。由于深度学习模型的准确性和鲁棒性主要归因于训练样本的纯度，因此，深度学习体系结构通常容易受到对抗性攻击的影响。对抗性攻击通常是通过对正常图像的微妙扰动而获得的，而正常图像对人类大多是无法察觉的，但可能会严重混淆最先进的机器学习模型。我们提出了一个名为Apudae的框架，利用DeNoing AutoCoders（DAES）通过以自适应方式使用这些样本来纯化这些样品，从而提高了已攻击目标分类器网络的分类精度。我们还展示了如何自适应地使用DAE，而不是直接使用它们，而是进一步提高分类精度，并且更强大，可以设计自适应攻击以欺骗它们。我们证明了我们对MNIST，CIFAR-10，Imagenet数据集的结果，并展示了我们的框架（Apudae）如何在净化对手的基线方法中提供可比性和更好的性能。我们还设计了专门设计的自适应攻击，以攻击我们的净化模型，并展示我们的防御方式如何强大。

With the rapid advancement and increased use of deep learning models in image identification, security becomes a major concern to their deployment in safety-critical systems. Since the accuracy and robustness of deep learning models are primarily attributed from the purity of the training samples, therefore the deep learning architectures are often susceptible to adversarial attacks. Adversarial attacks are often obtained by making subtle perturbations to normal images, which are mostly imperceptible to humans, but can seriously confuse the state-of-the-art machine learning models. We propose a framework, named APuDAE, leveraging Denoising AutoEncoders (DAEs) to purify these samples by using them in an adaptive way and thus improve the classification accuracy of the target classifier networks that have been attacked. We also show how using DAEs adaptively instead of using them directly, improves classification accuracy further and is more robust to the possibility of designing adaptive attacks to fool them. We demonstrate our results over MNIST, CIFAR-10, ImageNet dataset and show how our framework (APuDAE) provides comparable and in most cases better performance to the baseline methods in purifying adversaries. We also design adaptive attack specifically designed to attack our purifying model and demonstrate how our defense is robust to that.

下载PDF全文

下载文献需遵守相关版权规定

论文标题