对转移学习的模型反转攻击：反转模型而无需访问模型

论文标题

对转移学习的模型反转攻击：反转模型而无需访问模型

Model Inversion Attack against Transfer Learning: Inverting a Model without Accessing It

论文作者

Ye, Dayong, Chen, Huiqiang, Zhou, Shuai, Zhu, Tianqing, Zhou, Wanlei, Ji, Shouling

论文摘要

转移学习是一种重要的方法，它会产生预训练的教师模型，可用于快速建立专业的学生模型。但是，最近对转移学习的研究发现，它容易受到各种攻击的影响，例如错误分类和后门攻击。但是，尚不清楚转移学习是否容易受到模型反演攻击的影响。对转移学习计划发起模型反演攻击是一项挑战。学生模型不仅隐藏了其结构参数，而且对对手也无法访问。因此，当针对学生模型时，现有模型反演攻击的白色框和黑框版本都失败了。白框攻击失败，因为它们需要目标模型的参数。黑框攻击失败，因为它们取决于对目标模型的重复查询。但是，它们可能并不意味着转移学习模型无法模型反转攻击。因此，在本文中，我们通过两种新颖的攻击方法开始研究对转移学习方案的模型反演攻击。两者都是黑框攻击，适合不同情况，不依赖于目标学生模型的查询。在第一种方法中，对手具有与教师模型的培训集相同的分布的数据样本。在第二种方法中，对手没有任何此类样本。实验表明，这两种方法都可以恢复高度可识别的数据记录。这意味着，即使模型是无法访问的黑框，它仍然可以倒置。

Transfer learning is an important approach that produces pre-trained teacher models which can be used to quickly build specialized student models. However, recent research on transfer learning has found that it is vulnerable to various attacks, e.g., misclassification and backdoor attacks. However, it is still not clear whether transfer learning is vulnerable to model inversion attacks. Launching a model inversion attack against transfer learning scheme is challenging. Not only does the student model hide its structural parameters, but it is also inaccessible to the adversary. Hence, when targeting a student model, both the white-box and black-box versions of existing model inversion attacks fail. White-box attacks fail as they need the target model's parameters. Black-box attacks fail as they depend on making repeated queries of the target model. However, they may not mean that transfer learning models are impervious to model inversion attacks. Hence, with this paper, we initiate research into model inversion attacks against transfer learning schemes with two novel attack methods. Both are black-box attacks, suiting different situations, that do not rely on queries to the target student model. In the first method, the adversary has the data samples that share the same distribution as the training set of the teacher model. In the second method, the adversary does not have any such samples. Experiments show that highly recognizable data records can be recovered with both of these methods. This means that even if a model is an inaccessible black-box, it can still be inverted.

下载PDF全文

下载文献需遵守相关版权规定

论文标题