在GPU上并行最大列表

论文标题

在GPU上并行最大列表

Parallelizing Maximal Clique Enumeration on GPUs

论文作者

Almasri, Mohammad, Chang, Yen-Hsiang, Hajj, Izzat El, Nagi, Rakesh, Xiong, Jinjun, Hwu, Wen-mei

论文摘要

我们提出了一种用于确切最大列表（MCE）的GPU解决方案，该解决方案在Bron-Kerbosch算法之后执行搜索树遍历。在GPU上并行化MCE的先前作品执行了树的广度优先遍历，由于深度级别的树节点数量的爆炸数量爆炸，该树的可扩展性有限。我们建议通过并行执行独立子树的深度优先遍历在GPU上并行化MCE。由于MCE遭受了高负载不平衡和内存能力要求，因此我们提出了一个动态负载平衡的工人列表，以及部分诱导的子图表和不排除的顶点集的紧凑表示，以调节内存消耗。我们的评估表明，我们对单个GPU的GPU实现优于最先进的平行CPU实现，几何平均值为4.9倍（高达16.7倍），并有效地扩展到多个GPU。我们的代码已开源，以实现有关加速MCE的进一步研究。

We present a GPU solution for exact maximal clique enumeration (MCE) that performs a search tree traversal following the Bron-Kerbosch algorithm. Prior works on parallelizing MCE on GPUs perform a breadth-first traversal of the tree, which has limited scalability because of the explosion in the number of tree nodes at deep levels. We propose to parallelize MCE on GPUs by performing depth-first traversal of independent subtrees in parallel. Since MCE suffers from high load imbalance and memory capacity requirements, we propose a worker list for dynamic load balancing, as well as partial induced subgraphs and a compact representation of excluded vertex sets to regulate memory consumption. Our evaluation shows that our GPU implementation on a single GPU outperforms the state-of-the-art parallel CPU implementation by a geometric mean of 4.9x (up to 16.7x), and scales efficiently to multiple GPUs. Our code has been open-sourced to enable further research on accelerating MCE.

下载PDF全文

下载文献需遵守相关版权规定

论文标题