ogan：通过对抗攻击的训练来破坏深层攻击

论文标题

ogan：通过对抗攻击的训练来破坏深层攻击

OGAN: Disrupting Deepfakes with an Adversarial Attack that Survives Training

论文作者

Segalis, Eran, Galili, Eran

论文摘要

自动编码器和生成模型的最新进展引起了有效的视频伪造方法，用于产生所谓的“深击”。缓解研究主要集中于事后深层检测，而不是预防。我们通过引入一类新颖的对抗性攻击来补充这些努力---耐训练的攻击 - - 可以破坏面部扫描自动编码器是否已包含在上述自动编码器的培训集中。我们提出了振荡的GAN（OGAN）攻击，这是一种优化的抗训练的新型攻击，它将空间 - 暂时性扭曲引入了面部交换自动编码器的输出。为了实现OGAN，我们构建了一个双重优化问题，在该问题中，我们在其中训练生成器和彼此相互作用的面部交换模型实例。具体而言，我们将每个输入图像与目标失真配对，然后将它们送入产生对抗图像的发生器。当对其进行面部扫地自动编码器时，此图像将显示出失真。我们通过使用交替优化的迭代过程同时训练发电机和面部交换模型来解决优化问题。接下来，我们分析了先前发表的扭曲攻击，并表明它具有抗训练，尽管它的表现要胜过我们建议的ogan。最后，我们使用流行的面孔实施来验证这两种攻击，并表明它们跨不同的目标模型和目标面转移，包括面部，对抗性攻击未经训练。更广泛地说，这些结果表明存在抗训练的对抗攻击，可能适用于广泛的领域。

Recent advances in autoencoders and generative models have given rise to effective video forgery methods, used for generating so-called "deepfakes". Mitigation research is mostly focused on post-factum deepfake detection and not on prevention. We complement these efforts by introducing a novel class of adversarial attacks---training-resistant attacks---which can disrupt face-swapping autoencoders whether or not its adversarial images have been included in the training set of said autoencoders. We propose the Oscillating GAN (OGAN) attack, a novel attack optimized to be training-resistant, which introduces spatial-temporal distortions to the output of face-swapping autoencoders. To implement OGAN, we construct a bilevel optimization problem, where we train a generator and a face-swapping model instance against each other. Specifically, we pair each input image with a target distortion, and feed them into a generator that produces an adversarial image. This image will exhibit the distortion when a face-swapping autoencoder is applied to it. We solve the optimization problem by training the generator and the face-swapping model simultaneously using an iterative process of alternating optimization. Next, we analyze the previously published Distorting Attack and show it is training-resistant, though it is outperformed by our suggested OGAN. Finally, we validate both attacks using a popular implementation of FaceSwap, and show that they transfer across different target models and target faces, including faces the adversarial attacks were not trained on. More broadly, these results demonstrate the existence of training-resistant adversarial attacks, potentially applicable to a wide range of domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题