论文标题

简单,可扩展且稳定的变分深群集

Simple, Scalable, and Stable Variational Deep Clustering

论文作者

Cao, Lele, Asadi, Sahar, Zhu, Wenfei, Schmidli, Christian, Sjöberg, Michael

论文摘要

深度聚类(DC)已成为无监督聚类的最新设备。原则上,DC代表了多种无监督的方法,它们直接从非结构化数据集中直接学习基础群集和潜在表示。但是,由于高运营成本,低可伸缩性和不稳定的结果,DC方法通常对应用不良。在本文中,我们首先使用八个经验标准在工业适用性的背景下评估了几种流行的DC变体。然后,我们选择关注变分深聚类(VDC)方法,因为它们主要符合这些标准,除了简单性,可扩展性和稳定性。为了满足这三个未满足的标准,我们介绍了四种通用算法改进:初始$γ$ - 培训,定期$β$ - 耗资,迷你批次GMM(高斯混合模型)初始化和逆最小值 - 最大最大变换。我们还提出了一种新型的聚类算法S3VDC(简单,可扩展和稳定的VDC),该算法结合了所有这些改进。我们的实验表明,S3VDC在基准任务和一个没有任何地面真相标签的大型非结构化工业数据集上都超过了最先进的方法。此外,我们通过分析评估S3VDC的可用性和解释性。

Deep clustering (DC) has become the state-of-the-art for unsupervised clustering. In principle, DC represents a variety of unsupervised methods that jointly learn the underlying clusters and the latent representation directly from unstructured datasets. However, DC methods are generally poorly applied due to high operational costs, low scalability, and unstable results. In this paper, we first evaluate several popular DC variants in the context of industrial applicability using eight empirical criteria. We then choose to focus on variational deep clustering (VDC) methods, since they mostly meet those criteria except for simplicity, scalability, and stability. To address these three unmet criteria, we introduce four generic algorithmic improvements: initial $γ$-training, periodic $β$-annealing, mini-batch GMM (Gaussian mixture model) initialization, and inverse min-max transform. We also propose a novel clustering algorithm S3VDC (simple, scalable, and stable VDC) that incorporates all those improvements. Our experiments show that S3VDC outperforms the state-of-the-art on both benchmark tasks and a large unstructured industrial dataset without any ground truth label. In addition, we analytically evaluate the usability and interpretability of S3VDC.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源