论文标题
动态双输出扩散模型
Dynamic Dual-Output Diffusion Models
论文作者
论文摘要
基于迭代的基于denoising的一代,也称为DeNoising扩散模型,最近已被证明在质量上与其他类别的生成模型相当,甚至超过它们。特别是,包括生成的对抗网络,这些网络目前是图像生成的许多子任务中的最新状态。但是,这种方法的主要缺点是它需要数百次迭代才能产生竞争结果。最近的作品提出了解决方案,该解决方案允许更快地发电,但迭代率较少,但是图像质量逐渐恶化,在发电期间应用越来越少。在本文中,我们揭示了一些影响扩散模型发电质量的原因,尤其是当迭代次数很少的采样时,并提出了一种简单而有效的解决方案来减轻它们。我们考虑了迭代denoisis的两个相反方程,第一个预测施加的噪声,第二个直接预测图像。我们的解决方案采用了两个选项,并通过denoising过程学会了在它们之间动态交替的。我们提出的解决方案是一般的,可以应用于任何现有的扩散模型。如我们所示,当应用于各种SOTA体系结构时,我们的解决方案会立即通过可忽略的复杂性和参数来提高其发电质量。我们在多个数据集和配置上进行实验,并进行广泛的消融研究以支持这些发现。
Iterative denoising-based generation, also known as denoising diffusion models, has recently been shown to be comparable in quality to other classes of generative models, and even surpass them. Including, in particular, Generative Adversarial Networks, which are currently the state of the art in many sub-tasks of image generation. However, a major drawback of this method is that it requires hundreds of iterations to produce a competitive result. Recent works have proposed solutions that allow for faster generation with fewer iterations, but the image quality gradually deteriorates with increasingly fewer iterations being applied during generation. In this paper, we reveal some of the causes that affect the generation quality of diffusion models, especially when sampling with few iterations, and come up with a simple, yet effective, solution to mitigate them. We consider two opposite equations for the iterative denoising, the first predicts the applied noise, and the second predicts the image directly. Our solution takes the two options and learns to dynamically alternate between them through the denoising process. Our proposed solution is general and can be applied to any existing diffusion model. As we show, when applied to various SOTA architectures, our solution immediately improves their generation quality, with negligible added complexity and parameters. We experiment on multiple datasets and configurations and run an extensive ablation study to support these findings.