论文标题

非高斯推理的基于流动的可能性

Flow-Based Likelihoods for Non-Gaussian Inference

论文作者

Rivero, Ana Diaz, Dvorkin, Cora

论文摘要

我们研究了数据驱动的可能性绕过许多科学分析中的关键假设,即数据的真正可能性是高斯。特别是,我们建议使用基于流的生成模型的优化目标,这些模型可以通过通过非线性层转换简单的基础分布来捕获复杂的分布。我们称这些基于流的可能性(FBL)。我们分析了模拟高斯数据上重建的可能性的准确性和精度,并表明仅仅衡量从训练有素的模型中得出的样品的质量并不足够的表明,这表明已经学习了真正的可能性。不过,我们证明,由于样本量有限的样本量,可以将可能性重建到与采样误差相当的精度。然后,我们将FBLS应用于模拟弱透镜收敛功率光谱,这是一种可观察到的宇宙学,它是明显的非高斯(NG)。我们发现FBL非常好地捕获了数据中的NG签名,而其他常用数据驱动的可能性(例如高斯混合模型和独立的组件分析)也无法做到。这表明,具有数据驱动的可能性的NG数据中发现的作品可能会低估非高斯性在参数约束中的影响。通过引入一套可以在数据中捕获不同级别的NG级别的测试,我们表明传统数据驱动的可能性的成功或失败可以与数据中的NG结构联系起来。与其他方法不同,FBL的灵活性使其成功地同时解决不同类型的NG。因此,以及它们可能在数据集和域上的可能适用性,我们鼓励它们在有足够的模拟数据进行培训时使用它们。

We investigate the use of data-driven likelihoods to bypass a key assumption made in many scientific analyses, which is that the true likelihood of the data is Gaussian. In particular, we suggest using the optimization targets of flow-based generative models, a class of models that can capture complex distributions by transforming a simple base distribution through layers of nonlinearities. We call these flow-based likelihoods (FBL). We analyze the accuracy and precision of the reconstructed likelihoods on mock Gaussian data, and show that simply gauging the quality of samples drawn from the trained model is not a sufficient indicator that the true likelihood has been learned. We nevertheless demonstrate that the likelihood can be reconstructed to a precision equal to that of sampling error due to a finite sample size. We then apply FBLs to mock weak lensing convergence power spectra, a cosmological observable that is significantly non-Gaussian (NG). We find that the FBL captures the NG signatures in the data extremely well, while other commonly used data-driven likelihoods, such as Gaussian mixture models and independent component analysis, fail to do so. This suggests that works that have found small posterior shifts in NG data with data-driven likelihoods such as these could be underestimating the impact of non-Gaussianity in parameter constraints. By introducing a suite of tests that can capture different levels of NG in the data, we show that the success or failure of traditional data-driven likelihoods can be tied back to the structure of the NG in the data. Unlike other methods, the flexibility of the FBL makes it successful at tackling different types of NG simultaneously. Because of this, and consequently their likely applicability across datasets and domains, we encourage their use for inference when sufficient mock data are available for training.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源