通过梯度方差剪辑的二进制潜在变量的梯度估计

论文标题

通过梯度方差剪辑的二进制潜在变量的梯度估计

Gradient Estimation for Binary Latent Variables via Gradient Variance Clipping

论文作者

Kunes, Russell Z., Yin, Mingzhang, Land, Max, Haviv, Doron, Pe'er, Dana, Tavaré, Simon

论文摘要

在诸如增强学习和变分自动编码器（VAE）培训之类的上下文中，梯度估计通常是将生成模型与离散潜在变量拟合的必要条件。在许多情况下，解除武装估计器（Yin等人2020; Dong，Mnih和Tucker 2020）实现了Bernoulli潜在变量模型的最新梯度差异。但是，撤消和其他估计器在参数空间的边界附近可能会爆炸方差，而解决方案倾向于存在。为了改善此问题，我们提出了一个新的梯度估计器\ textit {Bitflip} -1，该{Bitflip} -1在参数空间边界的方差较低。由于BITFLIP-1具有与现有估计器的互补属性，因此我们引入了一个汇总的估计器，\ textIt {无偏梯度差异剪辑}（UGC），该估计值使用BitFlip-1或每个坐标的摘要梯度更新。从理论上讲，我们证明UGC的差异均高于解除武装。从经验上讲，我们观察到UGC在玩具实验，离散的VAE训练以及最佳子集选择问题中实现了优化目标的最佳价值。

Gradient estimation is often necessary for fitting generative models with discrete latent variables, in contexts such as reinforcement learning and variational autoencoder (VAE) training. The DisARM estimator (Yin et al. 2020; Dong, Mnih, and Tucker 2020) achieves state of the art gradient variance for Bernoulli latent variable models in many contexts. However, DisARM and other estimators have potentially exploding variance near the boundary of the parameter space, where solutions tend to lie. To ameliorate this issue, we propose a new gradient estimator \textit{bitflip}-1 that has lower variance at the boundaries of the parameter space. As bitflip-1 has complementary properties to existing estimators, we introduce an aggregated estimator, \textit{unbiased gradient variance clipping} (UGC) that uses either a bitflip-1 or a DisARM gradient update for each coordinate. We theoretically prove that UGC has uniformly lower variance than DisARM. Empirically, we observe that UGC achieves the optimal value of the optimization objectives in toy experiments, discrete VAE training, and in a best subset selection problem.

下载PDF全文

下载文献需遵守相关版权规定

论文标题