论文标题
量化深度学习方法的不确定性
Quantifying Uncertainty in Deep Learning Approaches to Radio Galaxy Classification
论文作者
论文摘要
在这项工作中,我们使用变分推断来量化深度学习模型分类的不确定性程度。我们表明,在标记射电星系时,单个测试样品的模型后方差水平与人的不确定性相关。我们探索了不同重量先验的模型性能和不确定性校准,并表明稀疏的先验会产生更良好的不确定性估计值。使用单个权重的后验分布,我们证明我们可以通过消除最低信噪比的权重来修剪完全连接的层权重的30%而不会大大损失性能。使用基于Fisher信息的排名可以实现更大程度的修剪,但是两种修剪方法都会影响Fanaroff-Riley I型和II型射电星系的不确定性校准。像该领域的其他工作一样,我们经历了冷后效应,因此必须将后验降低以实现良好的预测性能。我们检查了适应成本函数以适应模型错误指定是否可以弥补这一效果,但发现它没有显着差异。我们还研究了原则数据扩展的效果,发现这在基线上有所改善,但也不能弥补观察到的效果。我们将其解释为冷后效应,是由于我们的训练样本过度有效的策划导致了可能性错误,并将其作为贝叶斯深度学习方法的潜在问题提出,将来对宽度宽敞的银河系分类。
In this work we use variational inference to quantify the degree of uncertainty in deep learning model predictions of radio galaxy classification. We show that the level of model posterior variance for individual test samples is correlated with human uncertainty when labelling radio galaxies. We explore the model performance and uncertainty calibration for different weight priors and suggest that a sparse prior produces more well-calibrated uncertainty estimates. Using the posterior distributions for individual weights, we demonstrate that we can prune 30% of the fully-connected layer weights without significant loss of performance by removing the weights with the lowest signal-to-noise ratio. A larger degree of pruning can be achieved using a Fisher information based ranking, but both pruning methods affect the uncertainty calibration for Fanaroff-Riley type I and type II radio galaxies differently. Like other work in this field, we experience a cold posterior effect, whereby the posterior must be down-weighted to achieve good predictive performance. We examine whether adapting the cost function to accommodate model misspecification can compensate for this effect, but find that it does not make a significant difference. We also examine the effect of principled data augmentation and find that this improves upon the baseline but also does not compensate for the observed effect. We interpret this as the cold posterior effect being due to the overly effective curation of our training sample leading to likelihood misspecification, and raise this as a potential issue for Bayesian deep learning approaches to radio galaxy classification in future.