论文标题
通过模拟研究数据披露机制在推荐系统中的影响
Studying the Impact of Data Disclosure Mechanism in Recommender Systems via Simulation
论文作者
论文摘要
最近,依赖用户个人数据的Web服务中的隐私问题引起了极大的关注。与现有的保留隐私技术(例如联合学习和差异隐私)不同,我们探索了减轻用户隐私问题的另一种方法,使他们可以控制自己的数据。对于此目标,我们提出了一个隐私意识的建议框架,该框架使用户对其个人数据的微妙控制,包括隐式行为,例如点击和手表。在这个新框架中,用户可以根据预期的隐私风险和潜在公用事业之间的权衡来主动控制哪些数据要披露。然后,我们在不同的数据披露机制和建议模型下研究用户的隐私决策,以及他们的数据披露决策如何影响推荐系统的性能。 为了避免现实世界实验的高成本,我们应用模拟来研究我们提出的框架的影响。具体而言,我们提出了一种强化学习算法,以模拟用户在两个数据集的三种拟议平台机制下使用三个代表性推荐模型的决定。仿真结果表明,比大多数真实世界应用程序采用的“全部或全无”的二进制机制,具有更精细的分裂粒度和更加不受约束的披露策略的平台机制可以为最终用户和平台带来更好的结果。这也表明,我们提出的框架可以有效地保护用户的隐私,因为他们可以通过更少的披露数据获得可比甚至更好的结果。
Recently, privacy issues in web services that rely on users' personal data have raised great attention. Unlike existing privacy-preserving technologies such as federated learning and differential privacy, we explore another way to mitigate users' privacy concerns, giving them control over their own data. For this goal, we propose a privacy aware recommendation framework that gives users delicate control over their personal data, including implicit behaviors, e.g., clicks and watches. In this new framework, users can proactively control which data to disclose based on the trade-off between anticipated privacy risks and potential utilities. Then we study users' privacy decision making under different data disclosure mechanisms and recommendation models, and how their data disclosure decisions affect the recommender system's performance. To avoid the high cost of real-world experiments, we apply simulations to study the effects of our proposed framework. Specifically, we propose a reinforcement learning algorithm to simulate users' decisions (with various sensitivities) under three proposed platform mechanisms on two datasets with three representative recommendation models. The simulation results show that the platform mechanisms with finer split granularity and more unrestrained disclosure strategy can bring better results for both end users and platforms than the "all or nothing" binary mechanism adopted by most real-world applications. It also shows that our proposed framework can effectively protect users' privacy since they can obtain comparable or even better results with much less disclosed data.