论文标题
从攻击中学习:用于改善图像分类的攻击变异自动编码器
Learning from Attacks: Attacking Variational Autoencoder for Improving Image Classification
论文作者
论文摘要
对抗性攻击通常被认为是对深神经网络(DNNS)鲁棒性的威胁。已经开发出各种捍卫技术来减轻对抗性攻击对任务预测的潜在负面影响。这项工作从不同的角度分析了对抗攻击。也就是说,对抗性示例包含对预测的隐式信息,即图像分类,并处理针对DNN的对抗性攻击数据自我表达,作为提取的抽象表示,能够促进特定的学习任务。我们提出了一个算法框架,利用DNN的优势来进行数据自我表达和特定于任务的预测,以改善图像分类。该框架共同学习了用于攻击变分自动编码器(VAE)网络的DNN和用于分类的DNN,这是攻击VAE的攻击,以改善分类(AVIC)。实验结果表明,与干净的例子和传统的对抗训练相比,AVIC可以在标准数据集上获得更高的准确性。
Adversarial attacks are often considered as threats to the robustness of Deep Neural Networks (DNNs). Various defending techniques have been developed to mitigate the potential negative impact of adversarial attacks against task predictions. This work analyzes adversarial attacks from a different perspective. Namely, adversarial examples contain implicit information that is useful to the predictions i.e., image classification, and treat the adversarial attacks against DNNs for data self-expression as extracted abstract representations that are capable of facilitating specific learning tasks. We propose an algorithmic framework that leverages the advantages of the DNNs for data self-expression and task-specific predictions, to improve image classification. The framework jointly learns a DNN for attacking Variational Autoencoder (VAE) networks and a DNN for classification, coined as Attacking VAE for Improve Classification (AVIC). The experiment results show that AVIC can achieve higher accuracy on standard datasets compared to the training with clean examples and the traditional adversarial training.