论文标题
如何引导您的对手:有针对性高效的模型窃取防御措施,以梯度重定向
How to Steer Your Adversary: Targeted and Efficient Model Stealing Defenses with Gradient Redirection
论文作者
论文摘要
模型窃取攻击带来了公共机器学习API的困境。为了保护金融投资,公司可能被迫拒绝有关其模型的重要信息,这些信息可能有助于盗窃,包括不确定性估计和预测解释。这种妥协不仅对用户有害,而且对外部透明度有害。模型窃取防御措施试图通过使模型更难窃取,同时为良性用户保存实用程序,以解决这一困境。但是,现有的防御能力在实践中的性能较差,要么需要巨大的计算开销或严重的公用事业权衡。为了应对这些挑战,我们提出了一种新的方法来模拟称为梯度重定向的防御措施。我们方法的核心是一种可证明的最佳,有效的算法,用于以目标方式指导对手的培训更新。再加上替代网络的改进和一种新颖的协调防御策略,我们的梯度重定向防御,称为Grad $ {}^2 $,实现了小型公用事业的权衡和低计算机开销,表现出色,表现优于最佳的先前防御。此外,我们证明了梯度重定向如何以任意行为来重新编程对手,我们希望这能促进新的防御途径。
Model stealing attacks present a dilemma for public machine learning APIs. To protect financial investments, companies may be forced to withhold important information about their models that could facilitate theft, including uncertainty estimates and prediction explanations. This compromise is harmful not only to users but also to external transparency. Model stealing defenses seek to resolve this dilemma by making models harder to steal while preserving utility for benign users. However, existing defenses have poor performance in practice, either requiring enormous computational overheads or severe utility trade-offs. To meet these challenges, we present a new approach to model stealing defenses called gradient redirection. At the core of our approach is a provably optimal, efficient algorithm for steering an adversary's training updates in a targeted manner. Combined with improvements to surrogate networks and a novel coordinated defense strategy, our gradient redirection defense, called GRAD${}^2$, achieves small utility trade-offs and low computational overhead, outperforming the best prior defenses. Moreover, we demonstrate how gradient redirection enables reprogramming the adversary with arbitrary behavior, which we hope will foster work on new avenues of defense.