用分布式草图缩放图聚类

论文标题

用分布式草图缩放图聚类

Scaling Graph Clustering with Distributed Sketches

论文作者

Priest, Benjamin W., Dunton, Alec, Sanders, Geoffrey

论文摘要

在探索性图分析中，对社区结构的无监督学习，尤其是将顶点分配到集群或社区中，是一个规范且研究的问题。但是，就像大多数图一样，大量的引入巨大规模对传统方法提出了挑战。例如，在分布式内存中的频谱聚类需要数百个昂贵的散装同步通信弹，以将顶点的嵌入到相关矩阵的几个特征向量上。此外，如果基础图更改边缘更新百分比，则可能需要重复整个计算。我们提出了一种受光谱聚类启发的方法，我们使用从随机减少尺寸投影中得出的矩阵草图。我们表明，我们的方法会产生嵌入，从而使用快速的Johnson-Lindenstrauss和Countsketch Transforms进行完全动态的随机块模型流产生性能群集结果。我们还讨论了随机块模型参数对后续嵌入所需维度的影响，并展示随机投影如何显着改善分布式内存中图群集的性能。

The unsupervised learning of community structure, in particular the partitioning vertices into clusters or communities, is a canonical and well-studied problem in exploratory graph analysis. However, like most graph analyses the introduction of immense scale presents challenges to traditional methods. Spectral clustering in distributed memory, for example, requires hundreds of expensive bulk-synchronous communication rounds to compute an embedding of vertices to a few eigenvectors of a graph associated matrix. Furthermore, the whole computation may need to be repeated if the underlying graph changes some low percentage of edge updates. We present a method inspired by spectral clustering where we instead use matrix sketches derived from random dimension-reducing projections. We show that our method produces embeddings that yield performant clustering results given a fully-dynamic stochastic block model stream using both the fast Johnson-Lindenstrauss and CountSketch transforms. We also discuss the effects of stochastic block model parameters upon the required dimensionality of the subsequent embeddings, and show how random projections could significantly improve the performance of graph clustering in distributed memory.

下载PDF全文

下载文献需遵守相关版权规定

论文标题