论文标题
集群数量的贝叶斯混合模型(IN)一致性
Bayesian mixture models (in)consistency for the number of clusters
论文作者
论文摘要
贝叶斯非参数混合模型对于建模复杂数据很常见。尽管这些模型非常适合密度估计,但最新结果证明,当真正的组件是有限的,对于Dirichlet工艺和Pitman-pitman-yor过程混合模型时,簇数量的后验不一致。我们将这些结果扩展到其他贝叶斯非参数先验,例如吉布斯型过程和有限维度表示。后者包括DIRICHLET多项式过程,最近提出的Pitman-Yor和归一化的广义伽玛多项式过程。我们表明,基于这些过程的混合模型在簇数量中也不一致,并讨论了可能的解决方案。值得注意的是,我们表明,为Dirichlet过程引入的后处理算法可以扩展到更通用的模型,并提供了一种一致的方法来估计组件数量。
Bayesian nonparametric mixture models are common for modeling complex data. While these models are well-suited for density estimation, recent results proved posterior inconsistency of the number of clusters when the true number of components is finite, for the Dirichlet process and Pitman--Yor process mixture models. We extend these results to additional Bayesian nonparametric priors such as Gibbs-type processes and finite-dimensional representations thereof. The latter include the Dirichlet multinomial process, the recently proposed Pitman-Yor, and normalized generalized gamma multinomial processes. We show that mixture models based on these processes are also inconsistent in the number of clusters and discuss possible solutions. Notably, we show that a post-processing algorithm introduced for the Dirichlet process can be extended to more general models and provides a consistent method to estimate the number of components.