论文标题

TODD:计算机辅助药物发现中的拓扑复合指纹

ToDD: Topological Compound Fingerprinting in Computer-Aided Drug Discovery

论文作者

Demir, Andac, Coskunuzer, Baris, Segovia-Dominguez, Ignacio, Chen, Yuzhou, Gel, Yulia, Kiziltan, Bulent

论文摘要

在计算机辅助药物发现(CADD)中,虚拟筛查(VS)用于识别最有可能在大型化合物库中与分子靶标结合的候选药物。迄今为止,大多数VS方法都集中在使用规范化合物表示(例如,微笑字符串,指纹)或通过逐步训练更复杂的变分自动编码器(VAES)和图形神经网络(GNNS)来生成化合物的替代指纹。尽管VAE和GNNS导致了VS性能的显着改善,但这些方法在扩展到大型虚拟化合物数据集时的性能降低。在过去几年中,这些方法的性能仅显示出增量的改进。为了解决这个问题,我们使用多参数持久性(MP)同源性开发了一种新颖的方法,该方法将化合物的拓扑指纹作为多维向量。我们的主要贡献是通过将化合物划分为由其原子的定期性能并以多个分辨率水平提取其持续的同源性特征来将化合物分配到化学子结构中,将VS过程作为新的基于拓扑的图排名问题。我们表明,预验证的三重态网络的边缘损失微调在嵌入空间中的化合物区分并排名出成为有效的候选药物的可能性方面取得了高度竞争的结果。我们进一步建立了我们提出的MP签名稳定性特性的理论保证,并证明,通过MP签名增强了我们的模型,在基准数据集上超过了最先进的方法,其模型通过广泛且高度统计学上的显着利润率(例如,Cleves-Jain的93%增益,Cleves-Jain的增长率为93%,而Dudud-evate dududepleds)的增益为93%。

In computer-aided drug discovery (CADD), virtual screening (VS) is used for identifying the drug candidates that are most likely to bind to a molecular target in a large library of compounds. Most VS methods to date have focused on using canonical compound representations (e.g., SMILES strings, Morgan fingerprints) or generating alternative fingerprints of the compounds by training progressively more complex variational autoencoders (VAEs) and graph neural networks (GNNs). Although VAEs and GNNs led to significant improvements in VS performance, these methods suffer from reduced performance when scaling to large virtual compound datasets. The performance of these methods has shown only incremental improvements in the past few years. To address this problem, we developed a novel method using multiparameter persistence (MP) homology that produces topological fingerprints of the compounds as multidimensional vectors. Our primary contribution is framing the VS process as a new topology-based graph ranking problem by partitioning a compound into chemical substructures informed by the periodic properties of its atoms and extracting their persistent homology features at multiple resolution levels. We show that the margin loss fine-tuning of pretrained Triplet networks attains highly competitive results in differentiating between compounds in the embedding space and ranking their likelihood of becoming effective drug candidates. We further establish theoretical guarantees for the stability properties of our proposed MP signatures, and demonstrate that our models, enhanced by the MP signatures, outperform state-of-the-art methods on benchmark datasets by a wide and highly statistically significant margin (e.g., 93% gain for Cleves-Jain and 54% gain for DUD-E Diverse dataset).

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源