论文标题
图形神经网络用于链接预测与子图草图
Graph Neural Networks for Link Prediction with Subgraph Sketching
论文作者
论文摘要
与链接预测(LP)任务的简单启发式学相比,许多图形神经网络(GNN)的表现差。这是由于表达能力的局限性,例如无法计数三角形(大多数LP启发式的骨干),并且由于它们无法区分自动形态节点(具有相同的结构角色)。学习链接(而不是节点)表示可以缓解这两个表达性问题,并结合了三角计数等结构特征。由于明确的链接表示通常非常昂贵,因此最近的作品诉诸于基于子图的方法,这些方法已经实现了LP的最新性能,但由于子图之间的冗余水平很高,因此效率较差。我们分析了链接预测的子图GNN(SGNN)方法的组件。基于我们的分析,我们提出了一种称为ELPH的新型全图GNN(与Hashing的有效链接预测),该GNN将子图草图作为消息,以近似于没有明确的子图构造的SGNN的关键组成部分。事实证明,ELPH比消息传递GNNS(MPNN)更具表现力。它在许多标准LP基准上的现有SGNN模型的表现要快,同时更快地订单。但是,它具有共同的GNN限制,即仅在数据集中适用于GPU内存时才有效。因此,我们开发了一个称为Buddy的高度可扩展的模型,该模型使用功能预先摄入来规避此限制而无需牺牲预测性能。我们的实验表明,Buddy在标准LP基准上的表现还优于SGNN,同时比ELPH高可扩展和更快。
Many Graph Neural Networks (GNNs) perform poorly compared to simple heuristics on Link Prediction (LP) tasks. This is due to limitations in expressive power such as the inability to count triangles (the backbone of most LP heuristics) and because they can not distinguish automorphic nodes (those having identical structural roles). Both expressiveness issues can be alleviated by learning link (rather than node) representations and incorporating structural features such as triangle counts. Since explicit link representations are often prohibitively expensive, recent works resorted to subgraph-based methods, which have achieved state-of-the-art performance for LP, but suffer from poor efficiency due to high levels of redundancy between subgraphs. We analyze the components of subgraph GNN (SGNN) methods for link prediction. Based on our analysis, we propose a novel full-graph GNN called ELPH (Efficient Link Prediction with Hashing) that passes subgraph sketches as messages to approximate the key components of SGNNs without explicit subgraph construction. ELPH is provably more expressive than Message Passing GNNs (MPNNs). It outperforms existing SGNN models on many standard LP benchmarks while being orders of magnitude faster. However, it shares the common GNN limitation that it is only efficient when the dataset fits in GPU memory. Accordingly, we develop a highly scalable model, called BUDDY, which uses feature precomputation to circumvent this limitation without sacrificing predictive performance. Our experiments show that BUDDY also outperforms SGNNs on standard LP benchmarks while being highly scalable and faster than ELPH.