论文标题
伪造:利用室外自我划定进行多模式视频DeepFake检测
FakeOut: Leveraging Out-of-domain Self-supervision for Multi-modal Video Deepfake Detection
论文作者
论文摘要
近年来,视频合成方法迅速改善,可以轻松创建合成人类。这提出了一个问题,尤其是在社交媒体时代,因为可以使用说话的人类的综合视频以令人信服的方式传播错误信息。因此,迫切需要准确,可靠的深泡检测方法,该方法可以检测训练期间未见的伪造技术。在这项工作中,我们探讨了是否可以通过以自我监督的方式训练的多模式的,偏置的骨干骨架来完成这项工作,并适应了视频深板域。我们提出假货;在整个训练阶段和适应阶段,一种依赖多模式数据的新方法。我们证明了假货在检测各种类型的深击中的功效和鲁棒性,尤其是在训练过程中未见的操作。我们的方法实现了最新的方法,从而在视听数据集上实现了交叉概括。这项研究表明,也许令人惊讶的是,对室外视频进行培训(即,不是特别具有说话人类),可以导致更好的深击检测系统。代码可在GitHub上找到。
Video synthesis methods rapidly improved in recent years, allowing easy creation of synthetic humans. This poses a problem, especially in the era of social media, as synthetic videos of speaking humans can be used to spread misinformation in a convincing manner. Thus, there is a pressing need for accurate and robust deepfake detection methods, that can detect forgery techniques not seen during training. In this work, we explore whether this can be done by leveraging a multi-modal, out-of-domain backbone trained in a self-supervised manner, adapted to the video deepfake domain. We propose FakeOut; a novel approach that relies on multi-modal data throughout both the pre-training phase and the adaption phase. We demonstrate the efficacy and robustness of FakeOut in detecting various types of deepfakes, especially manipulations which were not seen during training. Our method achieves state-of-the-art results in cross-dataset generalization on audio-visual datasets. This study shows that, perhaps surprisingly, training on out-of-domain videos (i.e., not especially featuring speaking humans), can lead to better deepfake detection systems. Code is available on GitHub.