论文标题

铸造:基于相关的自适应光谱聚类算法多尺度数据

CAST: A Correlation-based Adaptive Spectral Clustering Algorithm on Multi-scale Data

论文作者

Li, Xiang, Kao, Ben, Shan, Caihua, Yin, Dawei, Ester, Martin

论文摘要

我们研究了将光谱聚类应用于群集多尺度数据的问题,该数据是其群集具有各种尺寸和密度的数据。传统的光谱聚类技术通过处理反映对象接近的相似性矩阵来发现簇。对于多尺度数据,基于距离的相似性无效,因为稀疏集群的对象可能会远距离,而密集群集的对象必须足够接近。在[16]之后,我们通过将对象的“可及性相似性”与给定的基于距离的相似性集成以得出对象系数矩阵来解决多尺度数据上的光谱聚类问题。我们提出了应用Trace Lasso的算法铸件来正规化系数矩阵。我们证明,所得的系数矩阵具有“分组效应”,并且表现出“稀疏性”。我们表明,这两个特征意味着非常有效的光谱聚类。我们在广泛的数据集W.R.T.上评估了铸件和其他10种聚类方法。各种措施。实验结果表明,在多尺度数据的测试用例中,铸造具有出色的性能,并且非常健壮。

We study the problem of applying spectral clustering to cluster multi-scale data, which is data whose clusters are of various sizes and densities. Traditional spectral clustering techniques discover clusters by processing a similarity matrix that reflects the proximity of objects. For multi-scale data, distance-based similarity is not effective because objects of a sparse cluster could be far apart while those of a dense cluster have to be sufficiently close. Following [16], we solve the problem of spectral clustering on multi-scale data by integrating the concept of objects' "reachability similarity" with a given distance-based similarity to derive an objects' coefficient matrix. We propose the algorithm CAST that applies trace Lasso to regularize the coefficient matrix. We prove that the resulting coefficient matrix has the "grouping effect" and that it exhibits "sparsity". We show that these two characteristics imply very effective spectral clustering. We evaluate CAST and 10 other clustering methods on a wide range of datasets w.r.t. various measures. Experimental results show that CAST provides excellent performance and is highly robust across test cases of multi-scale data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源