论文标题
马尔可夫决策过程中保存隐私的政策综合
Privacy-Preserving Policy Synthesis in Markov Decision Processes
论文作者
论文摘要
在决策问题中,代理商的行动可能会揭示推动其决策的敏感信息。例如,公司的投资决策可能会揭示其对市场动态的敏感知识。为了防止这种类型的信息泄漏,我们介绍了一种策略综合算法,该算法在马尔可夫决策过程中保护过渡概率的隐私。我们将差异隐私用作隐私的数学定义。该算法首先使用提供差异隐私的机制来散布过渡概率。然后,基于私有化的过渡概率,我们使用动态编程合成策略。我们的主要贡献是约束“隐私成本”,即,与隐私的预期总奖励与没有隐私的预期总奖励之间的差异。我们还表明,计算隐私成本具有时间复杂性,在问题的参数中是多项式的。此外,我们确定隐私成本随着差异隐私保护的强度而增加,我们量化了这一增加。最后,在两个示例环境上进行的数值实验验证了隐私成本与数据隐私保护的强度之间的既定关系。
In decision-making problems, the actions of an agent may reveal sensitive information that drives its decisions. For instance, a corporation's investment decisions may reveal its sensitive knowledge about market dynamics. To prevent this type of information leakage, we introduce a policy synthesis algorithm that protects the privacy of the transition probabilities in a Markov decision process. We use differential privacy as the mathematical definition of privacy. The algorithm first perturbs the transition probabilities using a mechanism that provides differential privacy. Then, based on the privatized transition probabilities, we synthesize a policy using dynamic programming. Our main contribution is to bound the "cost of privacy," i.e., the difference between the expected total rewards with privacy and the expected total rewards without privacy. We also show that computing the cost of privacy has time complexity that is polynomial in the parameters of the problem. Moreover, we establish that the cost of privacy increases with the strength of differential privacy protections, and we quantify this increase. Finally, numerical experiments on two example environments validate the established relationship between the cost of privacy and the strength of data privacy protections.