论文标题
具有相关观察结果的自适应数据分析
Adaptive Data Analysis with Correlated Observations
论文作者
论文摘要
自适应数据分析的绝大多数工作都集中在数据集中的样本是独立的情况下。在这种情况下,已经成功地应用了几种方法和工具,例如差异隐私,最大信息,压缩论证等。没有独立假设的情况,这种情况就不太理解了。 我们开始对自适应数据分析的可能性进行系统的研究,并进行相关的观察结果。首先,我们表明,在某些情况下,即使样本中存在依赖关系,差异隐私也可以保证概括,我们使用称为Gibbs依赖性的概念进行量化。我们以一个严格的负面例子来补充这个结果。其次,我们表明,转录压缩和自适应数据分析之间的连接可以扩展到非IID设置。
The vast majority of the work on adaptive data analysis focuses on the case where the samples in the dataset are independent. Several approaches and tools have been successfully applied in this context, such as differential privacy, max-information, compression arguments, and more. The situation is far less well-understood without the independence assumption. We embark on a systematic study of the possibilities of adaptive data analysis with correlated observations. First, we show that, in some cases, differential privacy guarantees generalization even when there are dependencies within the sample, which we quantify using a notion we call Gibbs-dependence. We complement this result with a tight negative example. Second, we show that the connection between transcript-compression and adaptive data analysis can be extended to the non-iid setting.