论文标题

通过K-Median和K均值的弱下限约束来实现匿名

Achieving anonymity via weak lower bound constraints for k-median and k-means

论文作者

Arutyunova, Anna, Schmidt, Melanie

论文摘要

我们研究了$ k $ - 集群问题,其中包括$ k $ - 米德式和$ k $ -MEANS的聚类和下限。除了点集$ p $和中心数量$ k $之外,(均匀)下限的$ k $ clustering问题还获得了数字$ b $。解决方案空间仅限于每个集群至少具有$ b $点的聚类。我们演示了如何通过将$ o(1)$ - 近似算法的$ o(1)$ o(1)$ o(1)$ k $ k $ median进行近似。 然后,我们提出了一个新的约束聚类问题,其中较低界限允许多次分配点(给不同的中心)。这意味着,对于每个点,聚类指定了分配给其分配的一组中心。我们将此聚类称为较弱的下限。我们给出$(6.5+ε)$ - $ K $ -Median聚类的近似值,下限较弱,$ O(1)$ - 近似$ K $ - $ k $ -MEAN-MEANS具有较弱的下限。 我们结论是,以近似因素不断增加,我们可以将每个点的任务数限制为$ 2 $(或者,如果允许分数分配,则为$ 1+ε$)。这也导致了第一个具有(标准)下限的$ k $ - eaneans的双质近似算法,而在某种意义上,在恒定因素违反了较低界限的意义上解释了双粒子。 本文中的所有算法及时运行,在$ n $和$ k $中是多项式(以及欧几里得变种的$ d $)。

We study $k$-clustering problems with lower bounds, including $k$-median and $k$-means clustering with lower bounds. In addition to the point set $P$ and the number of centers $k$, a $k$-clustering problem with (uniform) lower bounds gets a number $B$. The solution space is restricted to clusterings where every cluster has at least $B$ points. We demonstrate how to approximate $k$-median with lower bounds via a reduction to facility location with lower bounds, for which $O(1)$-approximation algorithms are known. Then we propose a new constrained clustering problem with lower bounds where we allow points to be assigned multiple times (to different centers). This means that for every point, the clustering specifies a set of centers to which it is assigned. We call this clustering with weak lower bounds. We give a $(6.5+ε)$-approximation for $k$-median clustering with weak lower bounds and an $O(1)$-approximation for $k$-means with weak lower bounds. We conclude by showing that at a constant increase in the approximation factor, we can restrict the number of assignments of every point to $2$ (or, if we allow fractional assignments, to $1+ε$). This also leads to the first bicritera approximation algorithm for $k$-means with (standard) lower bounds where bicriteria is interpreted in the sense that the lower bounds are violated by a constant factor. All algorithms in this paper run in time that is polynomial in $n$ and $k$ (and $d$ for the Euclidean variants considered).

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源