论文标题
通过在钻头飞机上执行特征一致性来实现对抗性鲁棒性
Towards Achieving Adversarial Robustness by Enforcing Feature Consistency Across Bit Planes
论文作者
论文摘要
作为人类,我们固有地基于图像的主要特征来感知图像,而忽略了嵌入在较低平面中的噪声。相反,众所周知,深层神经网络被确定地错误分类的图像被精心制作的扰动破坏,而这些扰动几乎是人眼几乎无法察觉的。在这项工作中,我们试图通过训练网络来解决这个问题,以根据较高的平面中的信息形成粗糙的印象,并使用较低的平面只是完善其预测。我们证明,通过对跨不同量化图像学到的表示形式施加一致性,与正常训练的模型相比,网络的对抗性鲁棒性可显着提高。目前针对对抗性攻击的最先进的防御能力要求使用对量计算上的对抗样本进行明确培训的网络。尽管这种使用对抗训练的方法继续取得了最佳效果,但这项工作为实现鲁棒性铺平了道路,而无需明确的对抗样本训练。因此,提出的方法更快,并且更接近人类的自然学习过程。
As humans, we inherently perceive images based on their predominant features, and ignore noise embedded within lower bit planes. On the contrary, Deep Neural Networks are known to confidently misclassify images corrupted with meticulously crafted perturbations that are nearly imperceptible to the human eye. In this work, we attempt to address this problem by training networks to form coarse impressions based on the information in higher bit planes, and use the lower bit planes only to refine their prediction. We demonstrate that, by imposing consistency on the representations learned across differently quantized images, the adversarial robustness of networks improves significantly when compared to a normally trained model. Present state-of-the-art defenses against adversarial attacks require the networks to be explicitly trained using adversarial samples that are computationally expensive to generate. While such methods that use adversarial training continue to achieve the best results, this work paves the way towards achieving robustness without having to explicitly train on adversarial samples. The proposed approach is therefore faster, and also closer to the natural learning process in humans.