通过离散的Stein操作员进行梯度估计

论文标题

通过离散的Stein操作员进行梯度估计

Gradient Estimation with Discrete Stein Operators

论文作者

Shi, Jiaxin, Zhou, Yuhao, Hwang, Jessica, Titsias, Michalis K., Mackey, Lester

论文摘要

梯度估计 - 近似于分布参数的期望的梯度 - 对于解决许多机器学习问题的解决方案至关重要。但是，当分布是离散的时，最常见的梯度估计器会遭受过度差异。为了提高梯度估计的质量，我们引入了基于Stein运算符的离散分布的差异技术。然后，我们使用此技术来构建灵活的控制变体，以增强剩余的估计器。我们的控制变体可以在线调整以最大程度地减少差异，并且不需要对目标功能进行额外的评估。在基准生成建模任务（例如训练二进制变异自动编码器）中，我们的梯度估计器的方差大大低于具有相同数量功能评估数量的最新估计器。

Gradient estimation -- approximating the gradient of an expectation with respect to the parameters of a distribution -- is central to the solution of many machine learning problems. However, when the distribution is discrete, most common gradient estimators suffer from excessive variance. To improve the quality of gradient estimation, we introduce a variance reduction technique based on Stein operators for discrete distributions. We then use this technique to build flexible control variates for the REINFORCE leave-one-out estimator. Our control variates can be adapted online to minimize variance and do not require extra evaluations of the target function. In benchmark generative modeling tasks such as training binary variational autoencoders, our gradient estimator achieves substantially lower variance than state-of-the-art estimators with the same number of function evaluations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题