论文标题
估计大图的描述符
Estimating Descriptors for Large Graphs
论文作者
论文摘要
将网络嵌入固定的尺寸特征空间中,同时保留其基本结构属性是图分析中的基本任务。这些特征向量(图形描述符)用于测量图之间的成对相似性。这可以在图形结构化数据上应用数据挖掘算法(例如分类,聚类或异常检测),这些数据在多个域中具有许多应用程序。用于计算描述符的最先进的算法要求整个图都在内存中,需要一个巨大的内存足迹,因此不能很好地扩展到增加实际世界网络的尺寸。在这项工作中,我们建议通过估计$ k \ leq 4 $的子图计数来有效地近似描述符,从而设计了两个现有的图形比较范式:图形内核和netSimile。我们的算法需要在边缘流进行单个扫描,其空间复杂性是输入大小的一部分,并且通过简单的采样方案近似嵌入。我们的设计利用了可用内存和估计精度之间的权衡,以提供一种适合有限的内存需求的方法。我们在现实世界网络上执行了广泛的实验,并证明我们的算法可以很好地扩展到大量图。
Embedding networks into a fixed dimensional feature space, while preserving its essential structural properties is a fundamental task in graph analytics. These feature vectors (graph descriptors) are used to measure the pairwise similarity between graphs. This enables applying data mining algorithms (e.g classification, clustering, or anomaly detection) on graph-structured data which have numerous applications in multiple domains. State-of-the-art algorithms for computing descriptors require the entire graph to be in memory, entailing a huge memory footprint, and thus do not scale well to increasing sizes of real-world networks. In this work, we propose streaming algorithms to efficiently approximate descriptors by estimating counts of sub-graphs of order $k\leq 4$, and thereby devise extensions of two existing graph comparison paradigms: the Graphlet Kernel and NetSimile. Our algorithms require a single scan over the edge stream, have space complexity that is a fraction of the input size, and approximate embeddings via a simple sampling scheme. Our design exploits the trade-off between available memory and estimation accuracy to provide a method that works well for limited memory requirements. We perform extensive experiments on real-world networks and demonstrate that our algorithms scale well to massive graphs.