自我归一化流

论文标题

自我归一化流

Self Normalizing Flows

论文作者

Keller, T. Anderson, Peters, Jorn W. T., Jaini, Priyank, Hoogeboom, Emiel, Forré, Patrick, Welling, Max

论文摘要

在许多机器学习设置中，尤其是在正常化的流程框架中，雅各布决定词项的有效梯度计算是一个核心问题。因此，大多数提出的流量模型要么限制在函数类中，因此可以轻松评估雅各布决定因素，或者其有效的估计器。但是，这些限制限制了此类密度模型的性能，通常需要显着深度才能达到所需的性能水平。在这项工作中，我们提出了自我归一化的流动，这是一个灵活的框架，用于训练归一化流，通过通过每一层的近似倒数来代替梯度中的昂贵术语。这将每一层精确更新的计算复杂性从$ \ Mathcal {o}（d^3）$减少到$ \ Mathcal {o}（d^2）$，允许训练流量架构，这些流程架构本来可以计算出来，同时也提供有效的采样。我们从实验上表明，此类模型非常稳定，并优化了与其确切的梯度对应物相似的数据可能性值，同时更快地训练并超过了功能受限的对应物的性能。

Efficient gradient computation of the Jacobian determinant term is a core problem in many machine learning settings, and especially so in the normalizing flow framework. Most proposed flow models therefore either restrict to a function class with easy evaluation of the Jacobian determinant, or an efficient estimator thereof. However, these restrictions limit the performance of such density models, frequently requiring significant depth to reach desired performance levels. In this work, we propose Self Normalizing Flows, a flexible framework for training normalizing flows by replacing expensive terms in the gradient by learned approximate inverses at each layer. This reduces the computational complexity of each layer's exact update from $\mathcal{O}(D^3)$ to $\mathcal{O}(D^2)$, allowing for the training of flow architectures which were otherwise computationally infeasible, while also providing efficient sampling. We show experimentally that such models are remarkably stable and optimize to similar data likelihood values as their exact gradient counterparts, while training more quickly and surpassing the performance of functionally constrained counterparts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题