Fusedmm：用于图形嵌入和图形神经网络的统一SDDMM-SPMM内核

论文标题

Fusedmm：用于图形嵌入和图形神经网络的统一SDDMM-SPMM内核

FusedMM: A Unified SDDMM-SpMM Kernel for Graph Embedding and Graph Neural Networks

论文作者

Rahman, Md. Khaledur, Sujon, Majedul Haque, Azad, Ariful

论文摘要

我们开发了一个融合的矩阵乘法内核，该内核统一采样致密的矩阵乘法和稀疏密度矩阵乘法在一个称为fusedmm的单个操作下。通过使用用户定义的功能，Fusedmm可以捕获流行的图形嵌入和GNN方法所需的几乎所有计算模式。 Fusedmm的数量级比深图库中的等效内核快。 Fusedmm的出色性能来自低级矢量化核，合适的负载平衡方案以及对内存带宽的有效利用。 Fusedmm可以使用代码生成器来调整其性能，并在英特尔，AMD和ARM处理器上表现出色。 Fusedmm在不同处理器上添加了将算法嵌入算法嵌入算法的速度加快。

We develop a fused matrix multiplication kernel that unifies sampled dense-dense matrix multiplication and sparse-dense matrix multiplication under a single operation called FusedMM. By using user-defined functions, FusedMM can capture almost all computational patterns needed by popular graph embedding and GNN approaches. FusedMM is an order of magnitude faster than its equivalent kernels in Deep Graph Library. The superior performance of FusedMM comes from the low-level vectorized kernels, a suitable load balancing scheme and an efficient utilization of the memory bandwidth. FusedMM can tune its performance using a code generator and perform equally well on Intel, AMD and ARM processors. FusedMM speeds up an end-to-end graph embedding algorithm by up to 28x on different processors.

下载PDF全文

下载文献需遵守相关版权规定

论文标题