论文标题
TIPRDC:与任务无关的隐私数据众库框架,用于深度学习,并使用匿名中间表示
TIPRDC: Task-Independent Privacy-Respecting Data Crowdsourcing Framework for Deep Learning with Anonymized Intermediate Representations
论文作者
论文摘要
深度学习的成功部分受益于各种大型数据集的可用性。这些数据集通常是来自个人用户的众包,并包含私人信息,例如性别,年龄等。用户对数据共享的新兴隐私问题阻碍了众包数据集的生成或使用,并导致对新的深度学习应用程序的培训数据饥饿。 One na\"ıve solution is to pre-process the raw data to extract features at the user-side, and then only the extracted features will be sent to the data collector. Unfortunately, attackers can still exploit these extracted features to train an adversary classifier to infer private attributes. Some prior arts leveraged game theory to protect private attributes. However, these defenses are designed for known primary learning tasks, the extracted features work poorly for unknown learning tasks. To解决学习任务可能未知的情况,我们提出了tiprdc,这是一个与任务无关的数据众群体,该框架的目的是在最大程度地培训中,可以在最大程度地收集数据中,以了解该框架的功能提取器。要学习匿名中间表示:(1)使用基于神经网络的共同信息估算器最大程度地保留原始信息,以将私人信息隐藏起来。
The success of deep learning partially benefits from the availability of various large-scale datasets. These datasets are often crowdsourced from individual users and contain private information like gender, age, etc. The emerging privacy concerns from users on data sharing hinder the generation or use of crowdsourcing datasets and lead to hunger of training data for new deep learning applications. One na\"ıve solution is to pre-process the raw data to extract features at the user-side, and then only the extracted features will be sent to the data collector. Unfortunately, attackers can still exploit these extracted features to train an adversary classifier to infer private attributes. Some prior arts leveraged game theory to protect private attributes. However, these defenses are designed for known primary learning tasks, the extracted features work poorly for unknown learning tasks. To tackle the case where the learning task may be unknown or changing, we present TIPRDC, a task-independent privacy-respecting data crowdsourcing framework with anonymized intermediate representation. The goal of this framework is to learn a feature extractor that can hide the privacy information from the intermediate representations; while maximally retaining the original information embedded in the raw data for the data collector to accomplish unknown learning tasks. We design a hybrid training method to learn the anonymized intermediate representation: (1) an adversarial training process for hiding private information from features; (2) maximally retain original information using a neural-network-based mutual information estimator.