牛顿蒙特卡洛：单站点MCMC符合二阶梯度方法

论文标题

牛顿蒙特卡洛：单站点MCMC符合二阶梯度方法

Newtonian Monte Carlo: single-site MCMC meets second-order gradient methods

论文作者

Arora, Nimar S., Tehrani, Nazanin Khosravani, Shah, Kinjal Divesh, Tingley, Michael, Li, Yucen Lily, Torabi, Narjes, Noursi, David, Masouleh, Sepehr Akhavan, Lippert, Eric, Meijer, Erik

论文摘要

单位马尔可夫链蒙特卡洛（MCMC）是MCMC的变体，其中在每个步骤中修改了状态空间中的单个坐标。结构化的关系模型是这种推论的好候选人。在单个站点上下文中，二阶方法变得可行，因为与这些方法相关的典型立方成本现在仅限于每个坐标的维度。我们称之为牛顿蒙特卡洛（NMC）的工作是一种通过分析目标密度的第一阶和二阶梯度来改善MCMC收敛的方法，以确定每个点上合适的建议密度。现有的基于一阶梯度的方法遭受确定适当步长的问题。步长太小，它将需要大量的步骤来收敛，而很大的步骤将导致它超过高密度区域。 NMC与优化中的Newton-Raphson更新相似，其中二阶梯度用于自动扩展每个维度的步长。但是，我们的目标是找到一个参数化的建议密度，而不是最大值。作为对现有的一阶和二阶方法的进一步改进，我们表明，在采取梯度步骤之前，不需要转换带有约束支持的随机变量。我们证明了NMC在许多不同域上的效率。对于统计模型，先验与可能性相结合的统计模型，我们的方法在一个步骤中非常微不足道地恢复了后部。但是，我们还显示了相当大的非偶联模型的结果，其中NMC的性能优于自适应一阶方法，例如螺母或其他不可扩展的推理方法，例如随机变异推理或自举。

Single-site Markov Chain Monte Carlo (MCMC) is a variant of MCMC in which a single coordinate in the state space is modified in each step. Structured relational models are a good candidate for this style of inference. In the single-site context, second order methods become feasible because the typical cubic costs associated with these methods is now restricted to the dimension of each coordinate. Our work, which we call Newtonian Monte Carlo (NMC), is a method to improve MCMC convergence by analyzing the first and second order gradients of the target density to determine a suitable proposal density at each point. Existing first order gradient-based methods suffer from the problem of determining an appropriate step size. Too small a step size and it will take a large number of steps to converge, while a very large step size will cause it to overshoot the high density region. NMC is similar to the Newton-Raphson update in optimization where the second order gradient is used to automatically scale the step size in each dimension. However, our objective is to find a parameterized proposal density rather than the maxima. As a further improvement on existing first and second order methods, we show that random variables with constrained supports don't need to be transformed before taking a gradient step. We demonstrate the efficiency of NMC on a number of different domains. For statistical models where the prior is conjugate to the likelihood, our method recovers the posterior quite trivially in one step. However, we also show results on fairly large non-conjugate models, where NMC performs better than adaptive first order methods such as NUTS or other inexact scalable inference methods such as Stochastic Variational Inference or bootstrapping.

下载PDF全文

下载文献需遵守相关版权规定

论文标题