论文标题
改进的概括保证了受限制的数据模型
Improved Generalization Guarantees in Restricted Data Models
论文作者
论文摘要
众所周知,差异隐私是为了防止因适应性或探索性数据分析引起的有效性的威胁 - 即使分析师对手搜索统计估算,即统计估算与基本人群的利益量的真实价值不同。这种保护的成本是差异隐私造成的准确性损失。在这项工作中,受基因组学文献中标准模型的启发,我们考虑了数据模型,其中个人由属性序列与属性的序列表示,而远处属性仅相关的属性。我们表明,在此假设下,可以在数据的不同部分“重新使用”隐私预算,从而显着提高准确性而不会增加过度拟合的风险。
Differential privacy is known to protect against threats to validity incurred due to adaptive, or exploratory, data analysis -- even when the analyst adversarially searches for a statistical estimate that diverges from the true value of the quantity of interest on the underlying population. The cost of this protection is the accuracy loss incurred by differential privacy. In this work, inspired by standard models in the genomics literature, we consider data models in which individuals are represented by a sequence of attributes with the property that where distant attributes are only weakly correlated. We show that, under this assumption, it is possible to "re-use" privacy budget on different portions of the data, significantly improving accuracy without increasing the risk of overfitting.