生成模型中的近似数据删除

论文标题

生成模型中的近似数据删除

Approximate Data Deletion in Generative Models

论文作者

Kong, Zhifeng, Alfeld, Scott

论文摘要

用户有权由第三方学习的系统删除其数据，这是由最近立法（例如《通用数据保护法规》（GDPR）和《加利福尼亚州消费者隐私法》（CCPA）编纂的。这样的数据删除可以通过全面的重新训练来实现，但是这会为现代机器学习模型带来高的计算成本。为了避免这种成本，已经开发了许多近似数据删除方法用于监督学习。相比之下，无监督的学习在很大程度上仍然是一个开放的问题，即（近似或精确）有效的数据删除。在本文中，我们为生成模型提出了一个基于密度比率的框架。使用此框架，我们引入了一种快速方法，用于近似数据删除和统计测试，以估算是否已删除训练点。我们在各种学习者假设下提供理论保证，并在各种生成方法中证明我们的方法。

Users have the right to have their data deleted by third-party learned systems, as codified by recent legislation such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA). Such data deletion can be accomplished by full re-training, but this incurs a high computational cost for modern machine learning models. To avoid this cost, many approximate data deletion methods have been developed for supervised learning. Unsupervised learning, in contrast, remains largely an open problem when it comes to (approximate or exact) efficient data deletion. In this paper, we propose a density-ratio-based framework for generative models. Using this framework, we introduce a fast method for approximate data deletion and a statistical test for estimating whether or not training points have been deleted. We provide theoretical guarantees under various learner assumptions and empirically demonstrate our methods across a variety of generative methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题