论文标题
用于后门防御的模型对抗性学习
Model-Contrastive Learning for Backdoor Defense
论文作者
论文摘要
由于人工智能(AI)技术的普及,我们目睹了越来越多的后门注射攻击,这些攻击旨在恶意威胁深度神经网络(DNN),导致错误分类。尽管存在各种防御方法可以有效地从DNN中擦除后门,但它们既遭受了高攻击成功率(ASR)和良性准确性(BA)的不可忽略的损失。受到观察的启发,即背式DNN倾向于在其特征空间中形成一个用于中毒数据的新集群,在本文中,我们提出了一种基于模型对比度学习(MCL)的新型两阶段后门防御方法,名为McLdef。在第一阶段,我们的方法基于触发综合进行触发反转,其中可用于生成中毒数据的结果触发器。在第二阶段,在MCL的指导和我们定义的正面和负面对的指导下,McLdef可以通过将中毒数据的功能表示形式拉到其干净的数据对应物中来净化后do的模型。由于中毒的数据量缩小,因此消除了端到端监督学习形成的后门。全面的实验结果表明,只有5%的清洁数据,MCLDEF的表现可显着超过最先进的防御方法,而ASR降低了95.79%的降低,而在大多数情况下,BA降解可以在不到2%的范围内控制。我们的代码可在https://github.com/wecanshow/mcl上找到。
Due to the popularity of Artificial Intelligence (AI) techniques, we are witnessing an increasing number of backdoor injection attacks that are designed to maliciously threaten Deep Neural Networks (DNNs) causing misclassification. Although there exist various defense methods that can effectively erase backdoors from DNNs, they greatly suffer from both high Attack Success Rate (ASR) and a non-negligible loss in Benign Accuracy (BA). Inspired by the observation that a backdoored DNN tends to form a new cluster in its feature spaces for poisoned data, in this paper we propose a novel two-stage backdoor defense method, named MCLDef, based on Model-Contrastive Learning (MCL). In the first stage, our approach performs trigger inversion based on trigger synthesis, where the resultant trigger can be used to generate poisoned data. In the second stage, under the guidance of MCL and our defined positive and negative pairs, MCLDef can purify the backdoored model by pulling the feature representations of poisoned data towards those of their clean data counterparts. Due to the shrunken cluster of poisoned data, the backdoor formed by end-to-end supervised learning is eliminated. Comprehensive experimental results show that, with only 5% of clean data, MCLDef significantly outperforms state-of-the-art defense methods by up to 95.79% reduction in ASR, while in most cases the BA degradation can be controlled within less than 2%. Our code is available at https://github.com/WeCanShow/MCL.