论文标题
DigraphWave:通过在有向图上扩散的结构节点嵌入的可扩展提取
Digraphwave: Scalable Extraction of Structural Node Embeddings via Diffusion on Directed Graphs
论文作者
论文摘要
结构性节点嵌入,向量捕获图中每个节点的局部连接信息,在数据挖掘和机器学习中具有许多应用程序,例如网络对齐和节点分类,群集和异常检测。为了分析有向图的分析,例如交易图,通信网络和社交网络,在结构节点嵌入中捕获定向信息的能力是非常需要的,嵌入式提取方法的可扩展性也是如此。但是,大多数现有方法仅是为无向图设计的。因此,我们提出了DigraphWave - 一种可扩展算法,用于在有向图上提取结构节点嵌入。 DigraphWave嵌入由压缩扩散模式特征组成,它们的增强两倍以增加其歧视能力。通过证明扩散初始化节点局部附近的热量上的下限,可以建立理论上是合理的扩散时间尺度值,而DigraphWave仅留下两个易于干扰的超范围:嵌入尺寸和邻域分辨率指定器。在我们的实验中,两种嵌入的增强(称为换位和聚集)被证明会导致对自动形态的宏F1得分显着提高,而DigraphWave优于所有其他结构性嵌入碱基。此外,DigraphWave要么胜过实际图数据集上所有基准的性能,要么在网络对齐任务中显示出特别大的性能增长,同时也可以扩展到具有数百万节点和边缘的图形,比以前的基于扩散模式的方法更快地运行30倍,并且具有内存消耗的一部分。
Structural node embeddings, vectors capturing local connectivity information for each node in a graph, have many applications in data mining and machine learning, e.g., network alignment and node classification, clustering and anomaly detection. For the analysis of directed graphs, e.g., transactions graphs, communication networks and social networks, the capability to capture directional information in the structural node embeddings is highly desirable, as is scalability of the embedding extraction method. Most existing methods are nevertheless only designed for undirected graph. Therefore, we present Digraphwave -- a scalable algorithm for extracting structural node embeddings on directed graphs. The Digraphwave embeddings consist of compressed diffusion pattern signatures, which are twice enhanced to increase their discriminate capacity. By proving a lower bound on the heat contained in the local vicinity of a diffusion initialization node, theoretically justified diffusion timescale values are established, and Digraphwave is left with only two easy-to-interpret hyperparameters: the embedding dimension and a neighbourhood resolution specifier. In our experiments, the two embedding enhancements, named transposition and aggregation, are shown to lead to a significant increase in macro F1 score for classifying automorphic identities, with Digraphwave outperforming all other structural embedding baselines. Moreover, Digraphwave either outperforms or matches the performance of all baselines on real graph datasets, displaying a particularly large performance gain in a network alignment task, while also being scalable to graphs with millions of nodes and edges, running up to 30x faster than a previous diffusion pattern based method and with a fraction of the memory consumption.