论文标题
最佳贝叶斯对高斯混合物的估计与越来越多的组件
Optimal Bayesian estimation of Gaussian mixtures with growing number of components
论文作者
论文摘要
我们研究了在一般设置中对有限混合模型的贝叶斯估计,在该设置中,组件的数量未知并允许随样本量增长。对成分数量增加的假设是自然的,因为样品中存在的异质性程度可以增长,并且随着样本量的增加,可以出现新组件,从而使数据的复杂性建模,从而完全灵活。但是,这将导致一个高维模型,这给估算带来了巨大的挑战。我们在贝叶斯模型中在小新生中采用了样本量依赖性的想法,并建立了许多重要的理论结果。我们首先表明,在先验的轻度条件下,相对于Wasserstein距离,后验分布集中在实际混合分布周围。在真实混合分布的分离条件下,我们进一步表明可以达到更好和适应性的收敛速率,并且可以始终如一地估算组件的数量。此外,我们得出了高阶混合模型的最佳收敛速率,其中组件的数量随意差异很快。此外,我们为使用Dirichlet工艺(DP)混合物提出了一个简单的配方,以估算有限混合模型并提供理论保证。特别是,我们提供了一种新颖的解决方案,用于在DP混合模型中采用簇数作为有限混合模型中组件数量的估计。进行了模拟研究和实际数据应用,以证明我们方法的实用性。
We study Bayesian estimation of finite mixture models in a general setup where the number of components is unknown and allowed to grow with the sample size. An assumption on growing number of components is a natural one as the degree of heterogeneity present in the sample can grow and new components can arise as sample size increases, allowing full flexibility in modeling the complexity of data. This however will lead to a high-dimensional model which poses great challenges for estimation. We novelly employ the idea of a sample size dependent prior in a Bayesian model and establish a number of important theoretical results. We first show that under mild conditions on the prior, the posterior distribution concentrates around the true mixing distribution at a near optimal rate with respect to the Wasserstein distance. Under a separation condition on the true mixing distribution, we further show that a better and adaptive convergence rate can be achieved, and the number of components can be consistently estimated. Furthermore, we derive optimal convergence rates for the higher-order mixture models where the number of components diverges arbitrarily fast. In addition, we suggest a simple recipe for using Dirichlet process (DP) mixture prior for estimating the finite mixture models and provide theoretical guarantees. In particular, we provide a novel solution for adopting the number of clusters in a DP mixture model as an estimate of the number of components in a finite mixture model. Simulation study and real data applications are carried out demonstrating the utilities of our method.