使用会员推理攻击量化蒙版语言模型的隐私风险

论文标题

使用会员推理攻击量化蒙版语言模型的隐私风险

Quantifying Privacy Risks of Masked Language Models Using Membership Inference Attacks

论文作者

Mireshghallah, Fatemehsadat, Goyal, Kartik, Uniyal, Archit, Berg-Kirkpatrick, Taylor, Shokri, Reza

论文摘要

掩盖语言模型〜（MLM）在敏感数据（从法律到医疗）上的广泛采用和应用需要对其隐私漏洞进行彻底的定量调查 - MLMS在多大程度上泄漏了有关其培训数据的信息？事先尝试通过会员推理攻击来衡量MLM的泄漏尚无定论，这意味着MLMS对隐私攻击的潜在鲁棒性。在这项工作中，我们认为先前的尝试尚无定论，因为它们仅基于传销的模型得分。我们根据可能性比假设检验设计了更强的会员推理攻击，该假设检验涉及额外的参考MLM，以更准确地量化MLMS中记忆的隐私风险。我们表明，蒙面的语言模型极易受到可能性比的会员资格推理攻击：我们对接受医学笔记培训的模型的实证结果表明，我们的攻击改善了先前的会员推理攻击的AUC，从0.66到令人震惊的高0.90级，低于5.90的水平，低率较低的攻击：以1％的假期正攻击，我们的攻击速度高于先前的攻击，比以前的攻击更加多。

The wide adoption and application of Masked language models~(MLMs) on sensitive data (from legal to medical) necessitates a thorough quantitative investigation into their privacy vulnerabilities -- to what extent do MLMs leak information about their training data? Prior attempts at measuring leakage of MLMs via membership inference attacks have been inconclusive, implying the potential robustness of MLMs to privacy attacks. In this work, we posit that prior attempts were inconclusive because they based their attack solely on the MLM's model score. We devise a stronger membership inference attack based on likelihood ratio hypothesis testing that involves an additional reference MLM to more accurately quantify the privacy risks of memorization in MLMs. We show that masked language models are extremely susceptible to likelihood ratio membership inference attacks: Our empirical results, on models trained on medical notes, show that our attack improves the AUC of prior membership inference attacks from 0.66 to an alarmingly high 0.90 level, with a significant improvement in the low-error region: at 1% false positive rate, our attack is 51X more powerful than prior work.

下载PDF全文

下载文献需遵守相关版权规定

论文标题