$ k $ -MEANS核心的经验评估

论文标题

$ k $ -MEANS核心的经验评估

An Empirical Evaluation of $k$-Means Coresets

论文作者

Schwiegelshohn, Chris, Sheikh-Omar, Omar Ali

论文摘要

核心是总结数据的最受欢迎的范例之一。特别是，存在许多用于聚类问题的高性能核心，例如理论和实践中的$ k $均值。奇怪的是，没有在比较可用$ k $ - 均值核心的质量方面的工作。在本文中，我们进行了这样的评估。目前尚无算法来测量候选核心的失真。我们提供了一些证据，表明为什么这在计算上可能很困难。为了补充这一点，我们提出了一个基准，我们认为计算核心具有挑战性，这也使我们可以轻松（启发式）对核心的评估。使用此基准和实际数据集，我们对理论和实践中最常用的核心算法进行了详尽的评估。

Coresets are among the most popular paradigms for summarizing data. In particular, there exist many high performance coresets for clustering problems such as $k$-means in both theory and practice. Curiously, there exists no work on comparing the quality of available $k$-means coresets. In this paper we perform such an evaluation. There currently is no algorithm known to measure the distortion of a candidate coreset. We provide some evidence as to why this might be computationally difficult. To complement this, we propose a benchmark for which we argue that computing coresets is challenging and which also allows us an easy (heuristic) evaluation of coresets. Using this benchmark and real-world data sets, we conduct an exhaustive evaluation of the most commonly used coreset algorithms from theory and practice.

下载PDF全文

下载文献需遵守相关版权规定

论文标题