论文标题
通过二项式采样的无偏和有效的对数似然估计
Unbiased and Efficient Log-Likelihood Estimation with Inverse Binomial Sampling
论文作者
论文摘要
科学假设的命运通常取决于计算模型解释数据的能力,该数据通过似然函数在现代统计方法中量化。对数可能是参数估计和模型评估的关键要素。但是,在计算生物学和神经科学等领域中复杂模型的对数可能通常是在分析或数值上计算的。在这种情况下,研究人员通常只能通过将观察到的数据与模型模拟产生的合成观察结果进行比较来估计对数可能性。通过模拟近似可能性的标准技术,要么使用数据的摘要统计数据,要么有可能在估计值中产生严重的偏见。在这里,我们探讨了另一种方法,逆二项式采样(IBS),该采样可以有效地估算整个数据集的对数可能性,并且没有偏见。对于每个观察结果,IBS都会从模拟器模型中绘制样品,直到与观察值匹配。然后,对数可能估计是绘制样品数量的函数。该估计器的方差均匀界定,实现了无偏估计器的最小差异,我们可以计算方差的校准估计值。我们提供了有利于IBS的理论论点,并通过基于仿真的模型对方法进行最大可能估算方法的经验评估。作为案例研究,我们从计算和认知神经科学提高了复杂性的三个模型问题。在所有问题中,IBS通常在估计参数中产生较低的误差,而最大对数可能性值比具有相同平均样品数量的替代采样方法产生的误差。我们的结果表明,当没有确切的技术可用时,IBS作为实用,健壮且易于实现的现场样本评估方法的潜力。
The fate of scientific hypotheses often relies on the ability of a computational model to explain the data, quantified in modern statistical approaches by the likelihood function. The log-likelihood is the key element for parameter estimation and model evaluation. However, the log-likelihood of complex models in fields such as computational biology and neuroscience is often intractable to compute analytically or numerically. In those cases, researchers can often only estimate the log-likelihood by comparing observed data with synthetic observations generated by model simulations. Standard techniques to approximate the likelihood via simulation either use summary statistics of the data or are at risk of producing severe biases in the estimate. Here, we explore another method, inverse binomial sampling (IBS), which can estimate the log-likelihood of an entire data set efficiently and without bias. For each observation, IBS draws samples from the simulator model until one matches the observation. The log-likelihood estimate is then a function of the number of samples drawn. The variance of this estimator is uniformly bounded, achieves the minimum variance for an unbiased estimator, and we can compute calibrated estimates of the variance. We provide theoretical arguments in favor of IBS and an empirical assessment of the method for maximum-likelihood estimation with simulation-based models. As case studies, we take three model-fitting problems of increasing complexity from computational and cognitive neuroscience. In all problems, IBS generally produces lower error in the estimated parameters and maximum log-likelihood values than alternative sampling methods with the same average number of samples. Our results demonstrate the potential of IBS as a practical, robust, and easy to implement method for log-likelihood evaluation when exact techniques are not available.