论文标题
经典图形结构特征优于基于分解的基于分解的图形嵌入方法
Classic Graph Structural Features Outperform Factorization-Based Graph Embedding Methods on Community Labeling
论文作者
论文摘要
图表示学习(也称为图形嵌入)是将网络结构纳入机器学习模型的流行技术。无监督的图形嵌入方法旨在通过学习每个节点的低维矢量表示(嵌入)来捕获图形结构。尽管这些嵌入在各种下游转导的机器学习任务中广泛使用,但对这种方法对于常见任务的有效性几乎没有原则分析。在这项工作中,我们为对成对社区标签的共同任务的一类嵌入提供了经验和理论分析。这是经典社区检测问题的二进制变体,该变体试图建立一个分类器来确定一对顶点是否参与社区。符合我们的基础理解的目标,我们专注于一类流行的无监督嵌入技术,这些技术学习了顶点接近矩阵的低等级因素(此类包括Grarep,DeepWalk,node2vec,netmf)。我们对具有地面真理的各种真实和合成图的社区标记进行详细的经验分析。在我们研究的所有情况下,通过嵌入功能训练的模型在社区标签上的表现不佳。在Constrast中,具有经典图结构特征的简单逻辑模型可容易地超过嵌入模型。为了获得更有原则的理解,我们为这些嵌入在捕获社区结构中的有效性(在)有效性方面提供了理论分析。我们正式证明,流行的低维分解方法不能产生社区结构,或者只能产生``不稳定''社区。这些社区在小小的扰动下本质上是不稳定的。
Graph representation learning (also called graph embeddings) is a popular technique for incorporating network structure into machine learning models. Unsupervised graph embedding methods aim to capture graph structure by learning a low-dimensional vector representation (the embedding) for each node. Despite the widespread use of these embeddings for a variety of downstream transductive machine learning tasks, there is little principled analysis of the effectiveness of this approach for common tasks. In this work, we provide an empirical and theoretical analysis for the performance of a class of embeddings on the common task of pairwise community labeling. This is a binary variant of the classic community detection problem, which seeks to build a classifier to determine whether a pair of vertices participate in a community. In line with our goal of foundational understanding, we focus on a popular class of unsupervised embedding techniques that learn low rank factorizations of a vertex proximity matrix (this class includes methods like GraRep, DeepWalk, node2vec, NetMF). We perform detailed empirical analysis for community labeling over a variety of real and synthetic graphs with ground truth. In all cases we studied, the models trained from embedding features perform poorly on community labeling. In constrast, a simple logistic model with classic graph structural features handily outperforms the embedding models. For a more principled understanding, we provide a theoretical analysis for the (in)effectiveness of these embeddings in capturing the community structure. We formally prove that popular low-dimensional factorization methods either cannot produce community structure, or can only produce ``unstable" communities. These communities are inherently unstable under small perturbations.