论文标题
您是否正确使用测试日志?
Are you using test log-likelihood correctly?
论文作者
论文摘要
测试日志样式通常用于比较相同数据的不同模型或用于拟合相同概率模型的不同近似推理算法。我们提出了简单的示例,展示了基于测试对数似然的比较如何根据其他目标与比较相矛盾。具体而言,我们的示例表明,(i)获得更高的测试对数可能性的近似贝叶斯推理算法也不需要产生更准确的后近似值,并且(ii)基于测试对数型类似比较的预测准确性的结论可能与基于根平方误差的结论一致。
Test log-likelihood is commonly used to compare different models of the same data or different approximate inference algorithms for fitting the same probabilistic model. We present simple examples demonstrating how comparisons based on test log-likelihood can contradict comparisons according to other objectives. Specifically, our examples show that (i) approximate Bayesian inference algorithms that attain higher test log-likelihoods need not also yield more accurate posterior approximations and (ii) conclusions about forecast accuracy based on test log-likelihood comparisons may not agree with conclusions based on root mean squared error.