论文标题
时间激烈的扩散几何学和拓扑
Time-inhomogeneous diffusion geometry and topology
论文作者
论文摘要
扩散凝结是一个动态过程,它产生了一系列旨在编码有意义抽象的多尺度数据表示的序列。事实证明,它可以有效地学习,降级,聚类和高维数据的可视化。扩散凝结构建为一个时间固定过程,每个步骤首先计算,然后将扩散操作员应用于数据。我们理论上从几何,光谱和拓扑角度分析了该过程的收敛性和演变。从几何角度来看,我们基于最小的过渡概率和数据半径获得收敛界限,而从光谱的角度来看,我们的边界基于扩散核的特征光谱。我们的光谱结果特别令人感兴趣,因为有关数据扩散的大多数文献都集中在均匀过程上。从拓扑的角度来看,我们显示扩散凝结概括了基于质心的层次聚类。我们使用此视角根据数据点的数量获得界限,而与其位置无关。为了了解超出收敛性数据几何形状的演变,我们使用拓扑数据分析。我们表明,冷凝过程本身定义了固有的凝结同源性。我们使用这种固有的拓扑以及凝结过程的环境持久同源性来研究数据如何在扩散时间内变化。我们在理解的玩具示例中演示了两种类型的拓扑信息。我们的工作给出了理论上的见解,对扩散凝结的收敛性,并表明它在拓扑数据和几何数据分析之间提供了联系。
Diffusion condensation is a dynamic process that yields a sequence of multiscale data representations that aim to encode meaningful abstractions. It has proven effective for manifold learning, denoising, clustering, and visualization of high-dimensional data. Diffusion condensation is constructed as a time-inhomogeneous process where each step first computes and then applies a diffusion operator to the data. We theoretically analyze the convergence and evolution of this process from geometric, spectral, and topological perspectives. From a geometric perspective, we obtain convergence bounds based on the smallest transition probability and the radius of the data, whereas from a spectral perspective, our bounds are based on the eigenspectrum of the diffusion kernel. Our spectral results are of particular interest since most of the literature on data diffusion is focused on homogeneous processes. From a topological perspective, we show diffusion condensation generalizes centroid-based hierarchical clustering. We use this perspective to obtain a bound based on the number of data points, independent of their location. To understand the evolution of the data geometry beyond convergence, we use topological data analysis. We show that the condensation process itself defines an intrinsic condensation homology. We use this intrinsic topology as well as the ambient persistent homology of the condensation process to study how the data changes over diffusion time. We demonstrate both types of topological information in well-understood toy examples. Our work gives theoretical insights into the convergence of diffusion condensation, and shows that it provides a link between topological and geometric data analysis.