论文标题
一般,功能强大,可扩展的图形变压器的配方
Recipe for a General, Powerful, Scalable Graph Transformer
论文作者
论文摘要
我们提出了一个关于如何建立具有线性复杂性和最先进结果的一般,强大,可扩展(GPS)的图形变压器的食谱。 Graph Transformers(GTS)在图形表示学习领域中具有多种近期出版物的知名度,但它们对构成良好的位置或结构编码的共同基础以及与众不同。在本文中,我们总结了具有更清晰的定义的不同类型的编码,并将其分类为$ \ textIt {local} $,$ \ textit {global} $或$ \ textit {fextit {ferseal} $。先前的GTS被限制为具有数百个节点的小图,在这里,我们提出了第一个体系结构,具有复杂性在节点和边缘$ O(N+E)$中的复杂性线性,通过将局部的实地聚合与完全连接的变压器解耦来。我们认为,这种解耦并不会对表现性产生负面影响,而我们的体系结构是图形上的通用函数近似器。我们的GPS配方包括选择3种主要成分:(i)位置/结构编码,(ii)局部消息通讯机制和(iii)全球注意机制。我们提供一个模块化框架$ \ textit {GraphGps} $,该{GraphGps} $支持多种类型的编码,并且在小图中提供了效率和可扩展性。我们在16个基准测试上测试了我们的体系结构,并在所有这些基准测试中都表现出了高度竞争的结果,这表明了模块化和不同策略的结合所获得的经验益处。
We propose a recipe on how to build a general, powerful, scalable (GPS) graph Transformer with linear complexity and state-of-the-art results on a diverse set of benchmarks. Graph Transformers (GTs) have gained popularity in the field of graph representation learning with a variety of recent publications but they lack a common foundation about what constitutes a good positional or structural encoding, and what differentiates them. In this paper, we summarize the different types of encodings with a clearer definition and categorize them as being $\textit{local}$, $\textit{global}$ or $\textit{relative}$. The prior GTs are constrained to small graphs with a few hundred nodes, here we propose the first architecture with a complexity linear in the number of nodes and edges $O(N+E)$ by decoupling the local real-edge aggregation from the fully-connected Transformer. We argue that this decoupling does not negatively affect the expressivity, with our architecture being a universal function approximator on graphs. Our GPS recipe consists of choosing 3 main ingredients: (i) positional/structural encoding, (ii) local message-passing mechanism, and (iii) global attention mechanism. We provide a modular framework $\textit{GraphGPS}$ that supports multiple types of encodings and that provides efficiency and scalability both in small and large graphs. We test our architecture on 16 benchmarks and show highly competitive results in all of them, show-casing the empirical benefits gained by the modularity and the combination of different strategies.