多变量正常内向高斯分布的无限混合物，用于聚集偏斜数据

论文标题

多变量正常内向高斯分布的无限混合物，用于聚集偏斜数据

Infinite mixtures of multivariate normal-inverse Gaussian distributions for clustering of skewed data

论文作者

Fang, Yuan, Karlis, Dimitris, Subedi, Sanjeena

论文摘要

多元正常逆高斯（MNIG）分布的混合物可用于聚集具有偏斜和较重尾巴等特征的数据。但是，对于集群分析，使用传统的有限混合模型框架，要么需要知道$ a $ a-$ priori $的组件数量，要么需要使用某些模型选择标准估算$ a $ a $ a $ a-posteriori $。但是，不同的模型选择标准有时会导致不同数量的组件产生不确定性。在这里，为MNIG分布的混合物提出了无限混合模型框架，也称为Dirichlet工艺混合模型。这种Dirichlet过程混合模型方法允许组件的数量从1到$ \ infty $自由生长或衰减（实际上从1到$ n $），并推断出组件的数量以及贝叶斯框架中的参数估计值，从而减轻了模型选择标准的需求。我们通过基准数据集提供真实的数据应用程序以及与其他现有模型相比的小型仿真实验。所提出的方法使用模拟研究说明了模拟和实际数据和参数恢复的其他聚类方法的竞争性聚类结果。

Mixtures of multivariate normal inverse Gaussian (MNIG) distributions can be used to cluster data that exhibit features such as skewness and heavy tails. However, for cluster analysis, using a traditional finite mixture model framework, either the number of components needs to be known $a$-$priori$ or needs to be estimated $a$-$posteriori$ using some model selection criterion after deriving results for a range of possible number of components. However, different model selection criteria can sometimes result in different number of components yielding uncertainty. Here, an infinite mixture model framework, also known as Dirichlet process mixture model, is proposed for the mixtures of MNIG distributions. This Dirichlet process mixture model approach allows the number of components to grow or decay freely from 1 to $\infty$ (in practice from 1 to $N$) and the number of components is inferred along with the parameter estimates in a Bayesian framework thus alleviating the need for model selection criteria. We provide real data applications with benchmark datasets as well as a small simulation experiment to compare with other existing models. The proposed method provides competitive clustering results to other clustering approaches for both simulation and real data and parameter recovery are illustrated using simulation studies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题