论文标题

$ k $ -MEANS核心的经验评估

An Empirical Evaluation of $k$-Means Coresets

论文作者

Schwiegelshohn, Chris, Sheikh-Omar, Omar Ali

论文摘要

核心是总结数据的最受欢迎的范例之一。特别是,存在许多用于聚类问题的高性能核心,例如理论和实践中的$ k $均值。奇怪的是,没有在比较可用$ k $ - 均值核心的质量方面的工作。 在本文中,我们进行了这样的评估。目前尚无算法来测量候选核心的失真。我们提供了一些证据,表明为什么这在计算上可能很困难。为了补充这一点,我们提出了一个基准,我们认为计算核心具有挑战性,这也使我们可以轻松(启发式)对核心的评估。使用此基准和实际数据集,我们对理论和实践中最常用的核心算法进行了详尽的评估。

Coresets are among the most popular paradigms for summarizing data. In particular, there exist many high performance coresets for clustering problems such as $k$-means in both theory and practice. Curiously, there exists no work on comparing the quality of available $k$-means coresets. In this paper we perform such an evaluation. There currently is no algorithm known to measure the distortion of a candidate coreset. We provide some evidence as to why this might be computationally difficult. To complement this, we propose a benchmark for which we argue that computing coresets is challenging and which also allows us an easy (heuristic) evaluation of coresets. Using this benchmark and real-world data sets, we conduct an exhaustive evaluation of the most commonly used coreset algorithms from theory and practice.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源