论文标题

甲板:用于防御普遍后门的模型硬化

DECK: Model Hardening for Defending Pervasive Backdoors

论文作者

Tao, Guanhong, Liu, Yingqi, Cheng, Siyuan, An, Shengwei, Zhang, Zhuo, Xu, Qiuling, Shen, Guangyu, Zhang, Xiangyu

论文摘要

普遍的后门是由动态和普遍的输入扰动触发的。它们可以被攻击者故意注射,也可以自然存在于经过正常训练的模型中。它们的性质与传统的静态和局部后门不同,可以通过扰动带有一些固定图案的小输入区域(例如带有纯色的补丁)来触发。现有的防御技术对于传统后门非常有效。但是,它们可能对普遍的后门无法正常工作,尤其是在后门去除和模型硬化方面。在本文中,我们提出了一种针对普遍的后门的新型模型硬化技术,包括天然和注射后门。我们基于通过特殊变换层增强的编码器架构来开发一般的普遍攻击。该攻击可以建模广泛的现有Pervasive后门攻击,并通过类距离进行量化。因此,使用我们在对抗训练中攻击的样本可以使模型与这些后门漏洞相比。我们在9个具有15个模型结构的9个数据集上的评估表明,我们的技术可以平均将阶级距离扩大59.65%,精度降解且没有稳健性损失,超过五种强化技术,胜过对抗性训练,普遍的对抗训练,MOTH,MOTH,MOTH等六个越来越多的攻击率。最新的后门拆除技术。

Pervasive backdoors are triggered by dynamic and pervasive input perturbations. They can be intentionally injected by attackers or naturally exist in normally trained models. They have a different nature from the traditional static and localized backdoors that can be triggered by perturbing a small input area with some fixed pattern, e.g., a patch with solid color. Existing defense techniques are highly effective for traditional backdoors. However, they may not work well for pervasive backdoors, especially regarding backdoor removal and model hardening. In this paper, we propose a novel model hardening technique against pervasive backdoors, including both natural and injected backdoors. We develop a general pervasive attack based on an encoder-decoder architecture enhanced with a special transformation layer. The attack can model a wide range of existing pervasive backdoor attacks and quantify them by class distances. As such, using the samples derived from our attack in adversarial training can harden a model against these backdoor vulnerabilities. Our evaluation on 9 datasets with 15 model structures shows that our technique can enlarge class distances by 59.65% on average with less than 1% accuracy degradation and no robustness loss, outperforming five hardening techniques such as adversarial training, universal adversarial training, MOTH, etc. It can reduce the attack success rate of six pervasive backdoor attacks from 99.06% to 1.94%, surpassing seven state-of-the-art backdoor removal techniques.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源