熵正规功率K-均值聚类

论文标题

熵正规功率K-均值聚类

Entropy Regularized Power k-Means Clustering

论文作者

Chakraborty, Saptarshi, Paul, Debolina, Das, Swagatam, Xu, Jason

论文摘要

尽管有众所周知的缺点，但$ k $ - ameans仍然是数据聚类最广泛的方法之一。当前的研究继续解决其缺陷，同时试图保留其简单性。最近，提出了\ textit {power $ k $ -means}算法，以避免通过通过一个平滑的表面退火来捕获本地最小值。但是，这种方法缺乏理论上的理由，并且当许多特征无关紧要时，在高维度上失败。本文通过引入\ textit {熵正则化}来解决这些问题，以在退火时学习功能相关性。我们证明了所提出的方法的一致性，并得出了具有封闭形式更新和收敛保证的可扩展最小化算法。特别是，我们的方法保留了相同的计算复杂性，即$ k $ -MEANS和POWER $ K $ -MEANS，但对两者都产生了重大改进。它的优点是在一套真实和合成数据实验的套件上进行彻底评估的。

Despite its well-known shortcomings, $k$-means remains one of the most widely used approaches to data clustering. Current research continues to tackle its flaws while attempting to preserve its simplicity. Recently, the \textit{power $k$-means} algorithm was proposed to avoid trapping in local minima by annealing through a family of smoother surfaces. However, the approach lacks theoretical justification and fails in high dimensions when many features are irrelevant. This paper addresses these issues by introducing \textit{entropy regularization} to learn feature relevance while annealing. We prove consistency of the proposed approach and derive a scalable majorization-minimization algorithm that enjoys closed-form updates and convergence guarantees. In particular, our method retains the same computational complexity of $k$-means and power $k$-means, but yields significant improvements over both. Its merits are thoroughly assessed on a suite of real and synthetic data experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题