论文标题
通过无参数的机器学习预测药物特性:帕累托最佳嵌入式建模(诗)
Predicting drug properties with parameter-free machine learning: Pareto-Optimal Embedded Modeling (POEM)
论文作者
论文摘要
从其分子结构中的吸收,分布,新陈代谢,排泄和毒性(ADMET)的预测是药物化学中的核心问题,在药物发现中非常重要。在常规上创建预测模型需要进行大量的试验,以选择分子表示,机器学习(ML)算法和超参数调整。通常在所有数据集中表现良好的通常适用的方法将具有很大的价值,但目前缺乏。在这里,我们描述了帕累托最佳嵌入式建模(Poem),这是一种基于相似性的预测分子特性的方法。诗是一种非参数,监督的ML算法,旨在生成可靠的预测模型而无需优化。诗歌预测强度是通过以上下文特异性的方式组合分子结构的多个不同表示,同时保持低维度来获得的。我们基于相对于行业标准的ML算法进行基准诗,并在17个分类任务中发布了结果。诗在所有情况下都表现良好,并降低了过度拟合的风险。
The prediction of absorption, distribution, metabolism, excretion, and toxicity (ADMET) of small molecules from their molecular structure is a central problem in medicinal chemistry with great practical importance in drug discovery. Creating predictive models conventionally requires substantial trial-and-error for the selection of molecular representations, machine learning (ML) algorithms, and hyperparameter tuning. A generally applicable method that performs well on all datasets without tuning would be of great value but is currently lacking. Here, we describe Pareto-Optimal Embedded Modeling (POEM), a similarity-based method for predicting molecular properties. POEM is a non-parametric, supervised ML algorithm developed to generate reliable predictive models without need for optimization. POEMs predictive strength is obtained by combining multiple different representations of molecular structures in a context-specific manner, while maintaining low dimensionality. We benchmark POEM relative to industry-standard ML algorithms and published results across 17 classifications tasks. POEM performs well in all cases and reduces the risk of overfitting.