评估分子图嵌入的自学学习

论文标题

评估分子图嵌入的自学学习

Evaluating Self-Supervised Learning for Molecular Graph Embeddings

论文作者

Wang, Hanchen, Kaddour, Jean, Liu, Shengchao, Tang, Jian, Lasenby, Joan, Liu, Qi

论文摘要

图形自我监督学习（GSSL）为获取嵌入而无需专家标记提供了可靠的途径，由于潜在分子的惊人数量和获得标记的高成本，这种能力对分子图具有深远的影响。但是，GSSL方法的设计不是用于在特定领域内进行优化，而是用于在各种下游任务中的可传递性。这种广泛的适用性使他们的评估复杂化。在应对这一挑战时，我们提出了“分子图表示评估”（Molgrapheval），生成具有可解释和多元化属性的分子图嵌入的详细概况。 Molgrapheval提供了一套探测任务，分为三类：（i）通用图，（ii）分子亚结构和（iii）嵌入空间特性。通过利用Molgrapheval来对现有的GSSL方法对当前下游数据集和我们的一系列任务进行基准测试，我们发现了仅从现有数据集中得出的推论与源自更细微的探测的推论之间存在明显的不一致之处。这些发现表明，当前的评估方法无法捕获整个景观。

Graph Self-Supervised Learning (GSSL) provides a robust pathway for acquiring embeddings without expert labelling, a capability that carries profound implications for molecular graphs due to the staggering number of potential molecules and the high cost of obtaining labels. However, GSSL methods are designed not for optimisation within a specific domain but rather for transferability across a variety of downstream tasks. This broad applicability complicates their evaluation. Addressing this challenge, we present "Molecular Graph Representation Evaluation" (MOLGRAPHEVAL), generating detailed profiles of molecular graph embeddings with interpretable and diversified attributes. MOLGRAPHEVAL offers a suite of probing tasks grouped into three categories: (i) generic graph, (ii) molecular substructure, and (iii) embedding space properties. By leveraging MOLGRAPHEVAL to benchmark existing GSSL methods against both current downstream datasets and our suite of tasks, we uncover significant inconsistencies between inferences drawn solely from existing datasets and those derived from more nuanced probing. These findings suggest that current evaluation methodologies fail to capture the entirety of the landscape.

下载PDF全文

下载文献需遵守相关版权规定

论文标题