论文标题
使用基于微笑表示的自我发项多任务学习来预测化学特性
Predicting Chemical Properties using Self-Attention Multi-task Learning based on SMILES Representation
论文作者
论文摘要
在化学特性的计算预测中,使用了编码为低维矢量的分子描述符和指纹。选择适当的分子描述符和指纹既重要又具有挑战性,因为这种模型的性能高度取决于描述符。为了克服这一挑战,研究了将简化的分子输入线进入系统作为输入的自然语言处理模型,与常规方法相比,几个变形金学变量模型取得了卓越的结果。在这项研究中,我们探讨了变压器变化模型的结构差异,并提出了一个新的基于自我注意力的模型。使用不平衡的化学数据集评估了在多任务学习环境中评估自我发场模块的表示性能。实验结果表明,我们的模型在几个基准数据集上实现了竞争成果。我们实验的源代码可在https://github.com/arwhirang/sa-mtl上获得,并且数据集可从同一URL获得。
In the computational prediction of chemical compound properties, molecular descriptors and fingerprints encoded to low dimensional vectors are used. The selection of proper molecular descriptors and fingerprints is both important and challenging as the performance of such models is highly dependent on descriptors. To overcome this challenge, natural language processing models that utilize simplified molecular input line-entry system as input were studied, and several transformer-variant models achieved superior results when compared with conventional methods. In this study, we explored the structural differences of the transformer-variant model and proposed a new self-attention based model. The representation learning performance of the self-attention module was evaluated in a multi-task learning environment using imbalanced chemical datasets. The experiment results showed that our model achieved competitive outcomes on several benchmark datasets. The source code of our experiment is available at https://github.com/arwhirang/sa-mtl and the dataset is available from the same URL.