论文标题
$ \ ell_2 $正规网络嵌入的渐近学
Asymptotics of $\ell_2$ Regularized Network Embeddings
论文作者
论文摘要
解决大型网络上的预测任务的一种常见方法,例如节点分类或链接预测,首先要学习网络节点的欧几里得嵌入,然后可以从中应用传统的机器学习方法。这包括DeepWalk和Node2Vec等方法,这些方法通过优化在随机梯度下降的每次迭代时优化图形子样本形成的随机损失来学习嵌入。在本文中,我们研究了将载体向量添加$ \ ell_2 $惩罚的效果,以使这些方法的训练损失损失。我们证明,在图表上的某些交换性假设下,这种渐近地导致学习具有核 - 型惩罚的图形,并保证了学到的嵌入载体的渐近分布。特别是,惩罚的确切形式取决于用作随机梯度下降的一部分的亚采样方法的选择。我们还从经验上说明,将节点协变量串联到$ \ ell_2 $正则Node2Vec嵌入会导致可比较的情况,而在不出色的情况下,性能与结合节点协变量和网络结构的方法以非线性方式。
A common approach to solving prediction tasks on large networks, such as node classification or link prediction, begin by learning a Euclidean embedding of the nodes of the network, from which traditional machine learning methods can then be applied. This includes methods such as DeepWalk and node2vec, which learn embeddings by optimizing stochastic losses formed over subsamples of the graph at each iteration of stochastic gradient descent. In this paper, we study the effects of adding an $\ell_2$ penalty of the embedding vectors to the training loss of these types of methods. We prove that, under some exchangeability assumptions on the graph, this asymptotically leads to learning a graphon with a nuclear-norm-type penalty, and give guarantees for the asymptotic distribution of the learned embedding vectors. In particular, the exact form of the penalty depends on the choice of subsampling method used as part of stochastic gradient descent. We also illustrate empirically that concatenating node covariates to $\ell_2$ regularized node2vec embeddings leads to comparable, when not superior, performance to methods which incorporate node covariates and the network structure in a non-linear manner.