论文标题

von mises-fisher分布与稀疏原型的混合物

Mixture of von Mises-Fisher distribution with sparse prototypes

论文作者

Rossi, Fabrice, Barbaro, Florian

论文摘要

von mises-fisher分布的混合物可用于将数据聚集在单位超球场上。这尤其适用于高维定向数据,例如文本。我们在本文中建议使用L 1惩罚的可能性估算Von Mises混合物。这导致稀疏的原型可改善聚类的可解释性。我们为此估计引入了一种期望最大化(EM)算法,并探索稀疏项与可能性算法的可能性折叠术之间的权衡。在模拟数据上研究了该模型的行为,我们在实际数据基准上显示了该方法的优势。我们还介绍了有关财务报告的新数据集,并展示了我们用于探索性分析方法的好处。

Mixtures of von Mises-Fisher distributions can be used to cluster data on the unit hypersphere. This is particularly adapted for high-dimensional directional data such as texts. We propose in this article to estimate a von Mises mixture using a l 1 penalized likelihood. This leads to sparse prototypes that improve clustering interpretability. We introduce an expectation-maximisation (EM) algorithm for this estimation and explore the trade-off between the sparsity term and the likelihood one with a path following algorithm. The model's behaviour is studied on simulated data and, we show the advantages of the approach on real data benchmark. We also introduce a new data set on financial reports and exhibit the benefits of our method for exploratory analysis.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源