论文标题

ASGN:用于分子性质预测的主动半监督图神经网络

ASGN: An Active Semi-supervised Graph Neural Network for Molecular Property Prediction

论文作者

Hao, Zhongkai, Lu, Chengqiang, Hu, Zheyuan, Wang, Hao, Huang, Zhenya, Liu, Qi, Chen, Enhong, Lee, Cheekong

论文摘要

分子性质预测(例如能量)是化学和生物学中的基本问题。不幸的是,许多监督的学习方法通​​常会遭受化学空间中稀缺标记的分子问题的困扰,在化学空间中,这种特性标签通常是通过密度功能理论(DFT)计算获得的,这是非常昂贵的。一个有效的解决方案是以半监督的方式合并未标记的分子。然而,学习大量分子的半监督表示形式具有挑战性,包括分子本质和结构的联合表示,代表和财产倾斜之间的冲突。在这里,我们提出了一个新的框架,称为活跃的半监督图神经网络(ASGN),通过合并标记和未标记的分子。具体而言,ASGN采用了一个教师框架。在教师模型中,我们提出了一种新型的半监督学习方法,以学习一般表示,从分子结构和分子分布中共同利用信息。然后在学生模型中,我们针对财产预测任务来应对学习损失冲突。最后,我们根据分子多样性提出了一种新颖的积极学习策略,以在整个框架学习过程中选择信息性数据。我们在几个公共数据集上进行了广泛的实验。实验结果表明我们的ASGN框架的出色表现。

Molecular property prediction (e.g., energy) is an essential problem in chemistry and biology. Unfortunately, many supervised learning methods usually suffer from the problem of scarce labeled molecules in the chemical space, where such property labels are generally obtained by Density Functional Theory (DFT) calculation which is extremely computational costly. An effective solution is to incorporate the unlabeled molecules in a semi-supervised fashion. However, learning semi-supervised representation for large amounts of molecules is challenging, including the joint representation issue of both molecular essence and structure, the conflict between representation and property leaning. Here we propose a novel framework called Active Semi-supervised Graph Neural Network (ASGN) by incorporating both labeled and unlabeled molecules. Specifically, ASGN adopts a teacher-student framework. In the teacher model, we propose a novel semi-supervised learning method to learn general representation that jointly exploits information from molecular structure and molecular distribution. Then in the student model, we target at property prediction task to deal with the learning loss conflict. At last, we proposed a novel active learning strategy in terms of molecular diversities to select informative data during the whole framework learning. We conduct extensive experiments on several public datasets. Experimental results show the remarkable performance of our ASGN framework.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源