论文标题

基于BFS的分布式算法,用于平行局部定向子枚举

BFS based distributed algorithm for parallel local directed sub-graph enumeration

论文作者

Levinas, Itay, Scherz, Roy, Louzoun, Yoram

论文摘要

估计子图的频率对于许多任务,包括子图同构,基于内核的异常检测和网络结构分析至关重要。虽然提出了多种算法以进行完全枚举或基于抽样的估计值,但这些方法在非常大的图中失败。并行化的最新进展允许在非常大的图中估算总数计数。计算与每个顶点相关的每个子图的频率的任务也收到了无向图的出色解决方案。但是,目前尚无针对非常大的有向图的好解决方案。 我们在这里提出VDMC(顶点特定的分布式基序计数) - 一种完全分布的算法,以最佳计数与图形每个顶点相关的所有3和4顶点连接的有向图(子图案)。 VDMC仅计数每个基序一次,其疗效在计数基序的数量中是线性的。它完全平行于基于GPU的计算有效。 VDMC基于三个主要要素:1)订购顶点,并且仅计算包含增加订单顶点的基序,2)基于组成基序的BFS的平均长度的子顺序基序,而3)删除整个图的同构象征性仅一次。我们在这里将VDMC与预期基序数量的分析估计值进行比较,并显示其准确性。 VDMC可作为高效的CPU和GPU代码,具有新型的数据结构,可有效地进行图形操作。我们显示了VDMC和现实图形的功效。 VDMC允许在大图中对每个顶点周围的子图形频率进行精确分析,并为扩展方法开辟了道路,直到现在,直到现在将数千个边缘的图限制在具有数百万边及以上的图形上。 git:https://github.com/louzounlab/graph-measures

Estimating the frequency of sub-graphs is of importance for many tasks, including sub-graph isomorphism, kernel-based anomaly detection, and network structure analysis. While multiple algorithms were proposed for full enumeration or sampling-based estimates, these methods fail in very large graphs. Recent advances in parallelization allow for estimates of total sub-graphs counts in very large graphs. The task of counting the frequency of each sub-graph associated with each vertex also received excellent solutions for undirected graphs. However, there is currently no good solution for very large directed graphs. We here propose VDMC (Vertex specific Distributed Motif Counting) -- a fully distributed algorithm to optimally count all the 3 and 4 vertices connected directed graphs (sub-graph motifs) associated with each vertex of a graph. VDMC counts each motif only once and its efficacy is linear in the number of counted motifs. It is fully parallelized to be efficient in GPU-based computation. VDMC is based on three main elements: 1) Ordering the vertices and only counting motifs containing increasing order vertices, 2) sub-ordering motifs based on the average length of the BFS composing the motif, and 3) removing isomorphisms only once for the entire graph. We here compare VDMC to analytical estimates of the expected number of motifs and show its accuracy. VDMC is available as a highly efficient CPU and GPU code with a novel data structure for efficient graph manipulation. We show the efficacy of VDMC and real-world graphs. VDMC allows for the precise analysis of sub-graph frequency around each vertex in large graphs and opens the way for the extension of methods until now limited to graphs of thousands of edges to graphs with millions of edges and above. GIT: https://github.com/louzounlab/graph-measures

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源