论文标题
模型提取攻击对图神经网络的攻击:分类和实现
Model Extraction Attacks on Graph Neural Networks: Taxonomy and Realization
论文作者
论文摘要
机器学习模型显示出对模型提取攻击的严重威胁,在这种攻击中,由训练有素的服务提供商拥有的训练有素的私人模型可以被假装为客户的攻击者偷走。不幸的是,先前的作品着重于在欧几里得空间(例如图像和文本)上训练的模型,而如何提取包含图形结构和节点特征的GNN模型尚待探索。在本文中,我们首次对针对GNN模型的模型进行了全面研究和开发模型提取攻击。我们首先在GNN模型提取的上下文中系统地将威胁建模正式化,并通过考虑攻击者的不同背景知识,例如属性和/或邻居连接,将对抗性威胁分为七个类别。然后,我们提出详细的方法,利用每个威胁中的可访问知识来实施攻击。通过评估三个现实世界数据集的评估,我们的攻击被证明可以有效提取重复的模型,即目标域中84%-89%的输入与受害者模型具有相同的输出预测。
Machine learning models are shown to face a severe threat from Model Extraction Attacks, where a well-trained private model owned by a service provider can be stolen by an attacker pretending as a client. Unfortunately, prior works focus on the models trained over the Euclidean space, e.g., images and texts, while how to extract a GNN model that contains a graph structure and node features is yet to be explored. In this paper, for the first time, we comprehensively investigate and develop model extraction attacks against GNN models. We first systematically formalise the threat modelling in the context of GNN model extraction and classify the adversarial threats into seven categories by considering different background knowledge of the attacker, e.g., attributes and/or neighbour connections of the nodes obtained by the attacker. Then we present detailed methods which utilise the accessible knowledge in each threat to implement the attacks. By evaluating over three real-world datasets, our attacks are shown to extract duplicated models effectively, i.e., 84% - 89% of the inputs in the target domain have the same output predictions as the victim model.