论文标题
多模式图检测
Multimodal Graph Learning for Deepfake Detection
论文作者
论文摘要
现有的DeepFake探测器在实现鲁棒性和概括方面面临着一些挑战。主要原因之一是他们从伪造视频中提取相关信息的能力有限,尤其是在存在各种伪像的情况下,例如空间,频率,时间和地标不匹配。当前的检测器依赖于像素级特征,这些功能容易受到未提供足够信息的未知干扰或面部地标影响。此外,大多数检测器无法利用来自多个域的信息进行检测,从而导致识别深击视频的有效性有限。为了解决这些局限性,我们提出了一个新颖的框架,即多模式图学习(MGL),该框架使用两个GNN和几个多模式融合模块从多种模式中利用信息。在框架级别上,我们采用双向跨模式变压器和一种自适应门控机制,将空间和频域中的特征与GNN捕获的几何增强标志特征相结合。在视频级别,我们使用图形注意网络(GAT)将视频中的每个帧表示为图中的节点,并将时间信息编码到图表的边缘中,以提取帧之间的时间不一致。我们提出的方法旨在有效地识别并利用区分特征进行深泡检测。我们通过对广泛使用的基准进行广泛的实验来评估方法的有效性,并证明我们的方法在对未知干扰的概括能力和鲁棒性方面优于最先进的检测器。
Existing deepfake detectors face several challenges in achieving robustness and generalization. One of the primary reasons is their limited ability to extract relevant information from forgery videos, especially in the presence of various artifacts such as spatial, frequency, temporal, and landmark mismatches. Current detectors rely on pixel-level features that are easily affected by unknown disturbances or facial landmarks that do not provide sufficient information. Furthermore, most detectors cannot utilize information from multiple domains for detection, leading to limited effectiveness in identifying deepfake videos. To address these limitations, we propose a novel framework, namely Multimodal Graph Learning (MGL) that leverages information from multiple modalities using two GNNs and several multimodal fusion modules. At the frame level, we employ a bi-directional cross-modal transformer and an adaptive gating mechanism to combine the features from the spatial and frequency domains with the geometric-enhanced landmark features captured by a GNN. At the video level, we use a Graph Attention Network (GAT) to represent each frame in a video as a node in a graph and encode temporal information into the edges of the graph to extract temporal inconsistency between frames. Our proposed method aims to effectively identify and utilize distinguishing features for deepfake detection. We evaluate the effectiveness of our method through extensive experiments on widely-used benchmarks and demonstrate that our method outperforms the state-of-the-art detectors in terms of generalization ability and robustness against unknown disturbances.