机器学习中隐私的概述

论文标题

机器学习中隐私的概述

An Overview of Privacy in Machine Learning

论文作者

De Cristofaro, Emiliano

论文摘要

在过去的几年中，Google，Microsoft和Amazon等提供商已开始为客户提供对软件界面的访问，从而使他们可以轻松地将机器学习任务嵌入到其应用程序中。总体而言，组织现在可以将机器学习用作服务（MLAAS）发动机将复杂的任务外包，例如培训分类器，执行预测，聚类等。他们还可以让其他人查询对其数据进行培训的模型。自然，在其他情况下，也可以使用这种方法（并经常提倡），包括政府合作，公民科学项目和企业对企业伙伴关系。但是，如果恶意用户能够恢复用于训练这些模型的数据，则结果信息泄漏将造成严重的问题。同样，如果模型的内部参数被视为专有信息，则访问模型不应允许对手学习此类参数。在本文档中，我们着手审查该领域的隐私挑战，对相关研究文献进行了系统的审查，还探讨了可能的对策。更具体地说，我们提供有关机器学习和隐私相关概念的大量背景信息。然后，我们讨论可能的对抗模型和设置，涵盖与私人和/或敏感信息泄漏有关的广泛攻击，并审查试图防御此类攻击的最新结果。最后，我们以一系列需要更多工作的开放性问题清单，包括需要更好的评估，更多的有针对性的防御和与政策和数据保护工作关系的研究。

Over the past few years, providers such as Google, Microsoft, and Amazon have started to provide customers with access to software interfaces allowing them to easily embed machine learning tasks into their applications. Overall, organizations can now use Machine Learning as a Service (MLaaS) engines to outsource complex tasks, e.g., training classifiers, performing predictions, clustering, etc. They can also let others query models trained on their data. Naturally, this approach can also be used (and is often advocated) in other contexts, including government collaborations, citizen science projects, and business-to-business partnerships. However, if malicious users were able to recover data used to train these models, the resulting information leakage would create serious issues. Likewise, if the inner parameters of the model are considered proprietary information, then access to the model should not allow an adversary to learn such parameters. In this document, we set to review privacy challenges in this space, providing a systematic review of the relevant research literature, also exploring possible countermeasures. More specifically, we provide ample background information on relevant concepts around machine learning and privacy. Then, we discuss possible adversarial models and settings, cover a wide range of attacks that relate to private and/or sensitive information leakage, and review recent results attempting to defend against such attacks. Finally, we conclude with a list of open problems that require more work, including the need for better evaluations, more targeted defenses, and the study of the relation to policy and data protection efforts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题