论文标题
了解缩小尺寸工具的工作方式:用于数据可视化的T-SNE,UMAP,TRIMAP和PACMAP的经验方法
Understanding How Dimension Reduction Tools Work: An Empirical Approach to Deciphering t-SNE, UMAP, TriMAP, and PaCMAP for Data Visualization
论文作者
论文摘要
降低尺寸(DR)技术(例如T-SNE,UMAP和TRIMAP)在许多现实世界数据集上表现出令人印象深刻的可视化性能。始终面临这些方法的一种张力是保护全球结构和保存本地结构之间的权衡:这些方法可以处理一种或另一种,但不能两者兼而有之。在这项工作中,我们的主要目标是了解DR方法的哪些方面对于保存本地和全球结构很重要:如果不真正了解我们在算法中所做的选择及其对它们产生的较低维度嵌入的经验影响,则很难设计更好的方法。为了实现本地结构保存的目标,我们根据我们对成功DR方法背后的机制的新理解提供了一些有用的设计原理。为了实现全球结构保存的目标,我们的分析阐明了要保存的组件的选择很重要。我们利用这些见解为DR设计新算法,称为成对受控的歧管近似投影(PACMAP),该投影均保留了本地和全局结构。我们的工作提供了一些意想不到的见解,以了解构造博士算法时要做出和避免的设计选择。
Dimension reduction (DR) techniques such as t-SNE, UMAP, and TriMAP have demonstrated impressive visualization performance on many real world datasets. One tension that has always faced these methods is the trade-off between preservation of global structure and preservation of local structure: these methods can either handle one or the other, but not both. In this work, our main goal is to understand what aspects of DR methods are important for preserving both local and global structure: it is difficult to design a better method without a true understanding of the choices we make in our algorithms and their empirical impact on the lower-dimensional embeddings they produce. Towards the goal of local structure preservation, we provide several useful design principles for DR loss functions based on our new understanding of the mechanisms behind successful DR methods. Towards the goal of global structure preservation, our analysis illuminates that the choice of which components to preserve is important. We leverage these insights to design a new algorithm for DR, called Pairwise Controlled Manifold Approximation Projection (PaCMAP), which preserves both local and global structure. Our work provides several unexpected insights into what design choices both to make and avoid when constructing DR algorithms.