动态后门攻击机器学习模型

论文标题

动态后门攻击机器学习模型

Dynamic Backdoor Attacks Against Machine Learning Models

论文作者

Salem, Ahmed, Wen, Rui, Backes, Michael, Ma, Shiqing, Zhang, Yang

论文摘要

在过去的十年中，机器学习（ML）取得了巨大进展，并在各种关键的现实应用程序中被采用。但是，最近的研究表明，ML模型容易受到多次安全性和隐私攻击的影响。特别是，对ML模型的后门攻击最近提高了很多意识。成功的后门攻击会导致严重的后果，例如允许对手绕过关键的身份验证系统。当前的后门技术依赖于在ML模型输入上添加静态触发器（具有固定模式和位置），这些输入容易通过当前的后门检测机制检测。在本文中，我们提出了针对深神经网络（DNN）的第一类动态后门技术，即随机后门，后门生成网络（BAN）和有条件的后门生成网络（C-BAN）。我们技术产生的触发因素可以具有随机的模式和位置，从而降低了当前后门检测机制的功效。特别是，基于新颖的生成网络的禁令和C-ban是算法生成触发器的前两个方案。此外，C-ban是第一个给定目标标签的有条件的后门技术，它可以生成目标特异性触发器。 BAN和C-BAN本质上都是一个通用框架，它为对手提供了进一步自定义后门攻击的灵活性。我们在三个基准数据集上广泛评估了我们的技术：MNIST，CELEBA和CIFAR-10。我们的技术几乎可以在后置数据上实现几乎完美的攻击性能，而实用程序损失微不足道。我们进一步表明，我们的技术可以绕过针对后门攻击的当前最新防御机制，包括ABS，Februus，MNTD，MNTD，神经清洁和脱衣舞。

Machine learning (ML) has made tremendous progress during the past decade and is being adopted in various critical real-world applications. However, recent research has shown that ML models are vulnerable to multiple security and privacy attacks. In particular, backdoor attacks against ML models have recently raised a lot of awareness. A successful backdoor attack can cause severe consequences, such as allowing an adversary to bypass critical authentication systems. Current backdooring techniques rely on adding static triggers (with fixed patterns and locations) on ML model inputs which are prone to detection by the current backdoor detection mechanisms. In this paper, we propose the first class of dynamic backdooring techniques against deep neural networks (DNN), namely Random Backdoor, Backdoor Generating Network (BaN), and conditional Backdoor Generating Network (c-BaN). Triggers generated by our techniques can have random patterns and locations, which reduce the efficacy of the current backdoor detection mechanisms. In particular, BaN and c-BaN based on a novel generative network are the first two schemes that algorithmically generate triggers. Moreover, c-BaN is the first conditional backdooring technique that given a target label, it can generate a target-specific trigger. Both BaN and c-BaN are essentially a general framework which renders the adversary the flexibility for further customizing backdoor attacks. We extensively evaluate our techniques on three benchmark datasets: MNIST, CelebA, and CIFAR-10. Our techniques achieve almost perfect attack performance on backdoored data with a negligible utility loss. We further show that our techniques can bypass current state-of-the-art defense mechanisms against backdoor attacks, including ABS, Februus, MNTD, Neural Cleanse, and STRIP.

下载PDF全文

下载文献需遵守相关版权规定

论文标题