论文标题
评估药物发现神经网络中的点预测不确定性
Evaluating Point-Prediction Uncertainties in Neural Networks for Drug Discovery
论文作者
论文摘要
神经网络(NN)模型提供了加快药物发现过程并降低其失败率的潜力。 NN模型的成功需要不确定性定量(UQ),因为药物发现探索了训练数据分布以外的化学空间。标准NN模型不提供不确定性信息。将贝叶斯模型与NN模型相结合的方法解决了这个问题,但难以实施,训练更昂贵。某些方法需要更改NN体系结构或培训程序,从而限制了NN模型的选择。此外,预测不确定性可能来自不同的来源。重要的是要具有分别对不同类型的预测不确定性进行建模的能力,因为该模型可以根据不确定性来源采取各种动作。在本文中,我们研究了涉及药物发现的NN模型的不同预测不确定性来源的UQ方法。我们使用有关化合物的先验知识来设计实验。通过使用可视化方法,我们从化合物集合中创建了非重叠和化学多样的分区。这些分区用作训练和测试集拆分,以探索NN模型不确定性。我们演示了通过所选方法估计的不确定性如何描述不同分区和特征性方案下的不同不确定性来源以及与预测误差的关系。
Neural Network (NN) models provide potential to speed up the drug discovery process and reduce its failure rates. The success of NN models require uncertainty quantification (UQ) as drug discovery explores chemical space beyond the training data distribution. Standard NN models do not provide uncertainty information. Methods that combine Bayesian models with NN models address this issue, but are difficult to implement and more expensive to train. Some methods require changing the NN architecture or training procedure, limiting the selection of NN models. Moreover, predictive uncertainty can come from different sources. It is important to have the ability to separately model different types of predictive uncertainty, as the model can take assorted actions depending on the source of uncertainty. In this paper, we examine UQ methods that estimate different sources of predictive uncertainty for NN models aiming at drug discovery. We use our prior knowledge on chemical compounds to design the experiments. By utilizing a visualization method we create non-overlapping and chemically diverse partitions from a collection of chemical compounds. These partitions are used as training and test set splits to explore NN model uncertainty. We demonstrate how the uncertainties estimated by the selected methods describe different sources of uncertainty under different partitions and featurization schemes and the relationship to prediction error.