论文标题
贝叶斯图像重建使用深层生成模型
Bayesian Image Reconstruction using Deep Generative Models
论文作者
论文摘要
机器学习模型是使用配对(输入,输出)数据的端到端和监督环境中的训练。示例包括最近对成对(低分辨率,高分辨率)图像训练的超分辨率方法。但是,这些端到端方法需要每次输入(例如,夜图像与日光)或相关的潜在变量(例如,摄像机模糊或手动运动)进行分配变化时。在这项工作中,我们利用最新的(SOTA)生成模型(此处stylegan2)来构建功能强大的图像先验,该模型可以在许多下游重建任务中应用贝叶斯定理。我们的方法是通过生成模型(BRGM)重建的贝叶斯重建,它使用单个预训练的生成器模型来解决不同的图像恢复任务,即,通过将其与不同的向前腐败模型相结合,从而解决了超级分辨率和镶嵌。我们将生成器模型的权重固定,并通过在生成重建的图像的输入潜伏向量上估算贝叶斯最大a-posterii(MAP)估计来重建图像。我们进一步使用变异推断来近似潜在向量的后验分布,从中我们采样了多个溶液。我们在三个大型且多样化的数据集上演示了BRGM:(i)来自Flick Face的60,000张图像(II)来自Mimic III的高质量数据集(II)240,000张胸部X射线和(iii)5个大脑MRI数据集和7,329次扫描的组合集合。在所有三个数据集中,没有任何数据集特异性的超参数调整,我们的简单方法可以通过当前特定于任务的最先进的方法进行超级分辨率和镶嵌的最先进方法竞争性能,同时更具普遍性,而无需任何培训。我们的源代码和预培训模型可在线获得:https://razvanmarinescu.github.io/brgm/。
Machine learning models are commonly trained end-to-end and in a supervised setting, using paired (input, output) data. Examples include recent super-resolution methods that train on pairs of (low-resolution, high-resolution) images. However, these end-to-end approaches require re-training every time there is a distribution shift in the inputs (e.g., night images vs daylight) or relevant latent variables (e.g., camera blur or hand motion). In this work, we leverage state-of-the-art (SOTA) generative models (here StyleGAN2) for building powerful image priors, which enable application of Bayes' theorem for many downstream reconstruction tasks. Our method, Bayesian Reconstruction through Generative Models (BRGM), uses a single pre-trained generator model to solve different image restoration tasks, i.e., super-resolution and in-painting, by combining it with different forward corruption models. We keep the weights of the generator model fixed, and reconstruct the image by estimating the Bayesian maximum a-posteriori (MAP) estimate over the input latent vector that generated the reconstructed image. We further use variational inference to approximate the posterior distribution over the latent vectors, from which we sample multiple solutions. We demonstrate BRGM on three large and diverse datasets: (i) 60,000 images from the Flick Faces High Quality dataset (ii) 240,000 chest X-rays from MIMIC III and (iii) a combined collection of 5 brain MRI datasets with 7,329 scans. Across all three datasets and without any dataset-specific hyperparameter tuning, our simple approach yields performance competitive with current task-specific state-of-the-art methods on super-resolution and in-painting, while being more generalisable and without requiring any training. Our source code and pre-trained models are available online: https://razvanmarinescu.github.io/brgm/.