论文标题

概率对比主成分分析

Probabilistic Contrastive Principal Component Analysis

论文作者

Li, Didong, Jones, Andrew, Engelhardt, Barbara

论文摘要

降低尺寸可用于探索性数据分析。在许多应用程序中,发现与“背景”数据集相对于“前景”数据集丰富的变体很有趣。最近,为此设置提出了对比性主成分分析(CPCA)。但是,缺乏正式的概率模型使得很难推理CPCA并调整其超参数。在这项工作中,我们提出了概率对比主成分分析(PCPCA),这是CPCA的基于模型的替代方案。我们讨论了如何在理论和实践中设置超参数,并显示了PCPCA比CPCA的几个优势,包括更大的可解释性,不确定性定量和原则上的推理,对噪声的鲁棒性和丢失数据的鲁棒性以及从模型中生成数据的能力。我们通过一系列模拟和病例对照实验证明了PCPCA的性能,并具有基因表达,蛋白质表达和图像的数据集。

Dimension reduction is useful for exploratory data analysis. In many applications, it is of interest to discover variation that is enriched in a "foreground" dataset relative to a "background" dataset. Recently, contrastive principal component analysis (CPCA) was proposed for this setting. However, the lack of a formal probabilistic model makes it difficult to reason about CPCA and to tune its hyperparameter. In this work, we propose probabilistic contrastive principal component analysis (PCPCA), a model-based alternative to CPCA. We discuss how to set the hyperparameter in theory and in practice, and we show several of PCPCA's advantages over CPCA, including greater interpretability, uncertainty quantification and principled inference, robustness to noise and missing data, and the ability to generate data from the model. We demonstrate PCPCA's performance through a series of simulations and case-control experiments with datasets of gene expression, protein expression, and images.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源