论文标题
利用提取的模型对手进行改进的黑匣子攻击
Leveraging Extracted Model Adversaries for Improved Black Box Attacks
论文作者
论文摘要
我们提出了一种针对黑匣子模型的对抗性输入生成的方法,用于阅读基于理解的问题答案。我们的方法由两个步骤组成。首先,我们通过模型提取近似于受害者黑匣子模型(Krishna等,2020)。其次,我们使用自己的白盒方法来生成输入扰动,从而导致近似模型失败。这些扰动的输入对受害者使用。在实验中,我们发现我们的方法提高了Addany的功效---在近似模型上进行了25%F1的白盒攻击,并且增加了添加的攻击---黑匣子攻击-------------------------- jia and Liang,2017)。
We present a method for adversarial input generation against black box models for reading comprehension based question answering. Our approach is composed of two steps. First, we approximate a victim black box model via model extraction (Krishna et al., 2020). Second, we use our own white box method to generate input perturbations that cause the approximate model to fail. These perturbed inputs are used against the victim. In experiments we find that our method improves on the efficacy of the AddAny---a white box attack---performed on the approximate model by 25% F1, and the AddSent attack---a black box attack---by 11% F1 (Jia and Liang, 2017).