UNINET：可扩展的网络表示学习，并带有大都市杂货抽样

论文标题

UNINET：可扩展的网络表示学习，并带有大都市杂货抽样

UniNet: Scalable Network Representation Learning with Metropolis-Hastings Sampling

论文作者

Yao, Xingyu, Shao, Yingxia, Cui, Bin, Chen, Lei

论文摘要

网络表示学习（NRL）技术已在各种数据挖掘和机器学习应用中成功采用。基于随机步行的NRL是一种流行的范式，它使用一组随机步行来捕获网络结构信息，然后采用Word2VEC模型来学习低维表示。但是，到目前为止，缺乏框架，该框架统一了现有的基于随机步行的NRL模型，并支持从大型网络中有效学习。主要障碍来自多种随机步行模型和随机步行生成的效率低下采样方法。在本文中，我们首先基于大都会悬挂采样技术引入了一个新的有效的边缘采样器，从理论上讲，边缘采样器的收敛属性到任意离散概率分布。然后，我们提出了一个随机的步行模型抽象，其中用户可以通过指定动态边缘权重和随机步行状态来轻松定义不同的过渡概率。我们的边缘采样器有效地支持抽象，因为我们的采样器可以从恒定时间复杂性中的未归一化概率分布中绘制样品。最后，通过新的边缘采样器和随机步行模型抽象，我们仔细实现了一个可扩展的NRL框架，称为UNINET。我们在11个现实世界数据集中使用五个基于步行的NRL模型进行了全面的实验，结果清楚地证明了UNINET在十亿边缘网络上的效率。

Network representation learning (NRL) technique has been successfully adopted in various data mining and machine learning applications. Random walk based NRL is one popular paradigm, which uses a set of random walks to capture the network structural information, and then employs word2vec models to learn the low-dimensional representations. However, until now there is lack of a framework, which unifies existing random walk based NRL models and supports to efficiently learn from large networks. The main obstacle comes from the diverse random walk models and the inefficient sampling method for the random walk generation. In this paper, we first introduce a new and efficient edge sampler based on Metropolis-Hastings sampling technique, and theoretically show the convergence property of the edge sampler to arbitrary discrete probability distributions. Then we propose a random walk model abstraction, in which users can easily define different transition probability by specifying dynamic edge weights and random walk states. The abstraction is efficiently supported by our edge sampler, since our sampler can draw samples from unnormalized probability distribution in constant time complexity. Finally, with the new edge sampler and random walk model abstraction, we carefully implement a scalable NRL framework called UniNet. We conduct comprehensive experiments with five random walk based NRL models over eleven real-world datasets, and the results clearly demonstrate the efficiency of UniNet over billion-edge networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题