通过随机预测快速近似cosimranks

论文标题

通过随机预测快速近似cosimranks

Fast Approximate CoSimRanks via Random Projections

论文作者

Yang, Renchi, Xiao, Xiaokui

论文摘要

给定带有$ n $节点的图$ g $，两个节点$ u，v \ in g $，{\ em cosimrank} value $ s（u，v）$量化了基于图形拓扑的$ u $和$ v $之间的相似性。与Simrank相比，Cosimrank在许多现实世界中都更加准确，更有效，包括同义词扩展，词典提取和知识图中的实体相关性。 $ g $中所有成对cosimranks的计算非常昂贵且具有挑战性。现有的解决方案都集中于为所有成对cosimranks的计算设计近似算法。为了获得所需的绝对精度保证$ε$，用于计算所有成对cosimranks的最新近似算法都需要$ o（n^3 \ log_2（\ ln（\ frac {1} 1}））$时间，即使$ε$很大，这也很昂贵。在本文中，我们提出了一种用于计算所有成对cosimrank值的快速随机算法。 \ rsim的基本思想是通过随机投影近似cosimrank计算中的$ n \ times n $矩阵乘法。从理论上讲，\ rsim在$ o（\ frac {n^2 \ ln（n）} {ε^2} \ ln（\ frac {1}ε））$ time中运行，同时确保在$ g $中，最多可确保最多$ε$的绝对错误。使用六个真实图的广泛实验表明，\ rSIM的数量级超过了最大的数量级。尤其是，在一个百万的Twitter图上，\ rsim回答了$ε$ -Approximate（$ε= 0.1 $），使用单个商品服务器在4小时内所有成对的cosimrank查询，而现有解决方案在一天之内未能终止。

Given a graph $G$ with $n$ nodes and two nodes $u,v\in G$, the {\em CoSimRank} value $s(u,v)$ quantifies the similarity between $u$ and $v$ based on graph topology. Compared to SimRank, CoSimRank is shown to be more accurate and effective in many real-world applications, including synonym expansion, lexicon extraction, and entity relatedness in knowledge graphs. The computation of all pairwise CoSimRanks in $G$ is highly expensive and challenging. Existing solutions all focus on devising approximate algorithms for the computation of all pairwise CoSimRanks. To attain a desired absolute accuracy guarantee $ε$, the state-of-the-art approximate algorithm for computing all pairwise CoSimRanks requires $O(n^3\log_2(\ln(\frac{1}ε)))$ time, which is prohibitively expensive even though $ε$ is large. In this paper, we propose \rsim, a fast randomized algorithm for computing all pairwise CoSimRank values. The basic idea of \rsim is to approximate the $n\times n$ matrix multiplications in CoSimRank computation via random projection. Theoretically, \rsim runs in $O(\frac{n^2\ln(n)}{ε^2}\ln(\frac{1}ε))$ time and meanwhile ensures an absolute error of at most $ε$ in each CoSimRank value in $G$ with a high probability. Extensive experiments using six real graphs demonstrate that \rsim is more than orders of magnitude faster than the state of the art. In particular, on a million-edge Twitter graph, \rsim answers the $ε$-approximate ($ε=0.1$) all pairwise CoSimRank query within 4 hours, using a single commodity server, while existing solutions fail to terminate within a day.

下载PDF全文

下载文献需遵守相关版权规定

论文标题