论文标题
贝叶斯学习中的最低多余风险
Minimum Excess Risk in Bayesian Learning
论文作者
论文摘要
我们通过定义和限制最小的多余风险(MER)(MER)来分析贝叶斯学习在生成模型下的最佳性能:如果已知模型实现,可以通过数据获得的最小预期损失与可以实现的最小预期损失之间的差距。 MER的定义提供了一种原则性的方式来定义贝叶斯学习中不确定性的不同概念,包括息肉的不确定性和最低认知不确定性。提出了两种用于得出MER上限的方法。第一种方法通常适用于具有参数生成模型的贝叶斯学习,它通过模型参数之间的条件互信息和被观察到的数据预测的数量来限制MER。它使我们能够量化MER衰减的速率为零,随着更多的数据可用。在可实现的模型下,此方法还将MER与生成函数类的丰富性有关,尤其是二进制分类中的VC维度。第二种方法,特别适合使用参数预测模型的贝叶斯学习,将MER与数据中模型参数的最小估计误差有关。它明确显示了模型参数估计中的不确定性如何转化为MER和最终预测不确定性。我们还将MER的定义和分析扩展到具有多个模型家族的设置以及具有非参数模型的设置。在讨论中,我们在贝叶斯学习中的MER与经常学习的多余风险之间进行了一些比较。
We analyze the best achievable performance of Bayesian learning under generative models by defining and upper-bounding the minimum excess risk (MER): the gap between the minimum expected loss attainable by learning from data and the minimum expected loss that could be achieved if the model realization were known. The definition of MER provides a principled way to define different notions of uncertainties in Bayesian learning, including the aleatoric uncertainty and the minimum epistemic uncertainty. Two methods for deriving upper bounds for the MER are presented. The first method, generally suitable for Bayesian learning with a parametric generative model, upper-bounds the MER by the conditional mutual information between the model parameters and the quantity being predicted given the observed data. It allows us to quantify the rate at which the MER decays to zero as more data becomes available. Under realizable models, this method also relates the MER to the richness of the generative function class, notably the VC dimension in binary classification. The second method, particularly suitable for Bayesian learning with a parametric predictive model, relates the MER to the minimum estimation error of the model parameters from data. It explicitly shows how the uncertainty in model parameter estimation translates to the MER and to the final prediction uncertainty. We also extend the definition and analysis of MER to the setting with multiple model families and the setting with nonparametric models. Along the discussions we draw some comparisons between the MER in Bayesian learning and the excess risk in frequentist learning.