论文标题
近似$(k,\ ell)$ - 多边形曲线的中间聚类
Approximating $(k,\ell)$-Median Clustering for Polygonal Curves
论文作者
论文摘要
2015年,Driemel,Krivošija和Sohler引入了$(K,\ ell)$ - 在Fréchet距离下聚类多边形曲线的中位数问题。给定一组输入曲线,问题要求找到$ k $的中间曲线,最多是$ \ ell $ pertices,每个曲线都将所有输入曲线的fréchet距离之和最小化到其最接近的中值曲线。其算法的一个主要缺点是,输入曲线仅限于实际线路。在本文中,我们提出了一种随机的双晶型算法,该算法适用于$ \ Mathbb {r}^d $中的多边形曲线,并实现了相对于聚类成本的近似因子$(1+ε)$。该算法在曲线的数量上具有最差的运行时间线性,每条曲线的最大顶点数量,即它们的复杂性,并且在$ d $,$ \ ell $,$ε$,$ε$和$Δ$中,指数呈指数,即失败概率。我们通过捷径引理实现了这一结果,该曲目保证了具有相似成本的多边形曲线的存在,作为$ \ ell $的最佳中值曲线,但最多的复杂性,最多$ 2 \ ell-2 $,并且可以有效地计算出其顶点。我们将这种引理与Kumar等人的超集采样技术相结合。得出我们的聚类结果。为此,我们描述和分析了Ackermann等人对算法的概括,这可能具有独立的关注。
In 2015, Driemel, Krivošija and Sohler introduced the $(k,\ell)$-median problem for clustering polygonal curves under the Fréchet distance. Given a set of input curves, the problem asks to find $k$ median curves of at most $\ell$ vertices each that minimize the sum of Fréchet distances over all input curves to their closest median curve. A major shortcoming of their algorithm is that the input curves are restricted to lie on the real line. In this paper, we present a randomized bicriteria-approximation algorithm that works for polygonal curves in $\mathbb{R}^d$ and achieves approximation factor $(1+ε)$ with respect to the clustering costs. The algorithm has worst-case running-time linear in the number of curves, polynomial in the maximum number of vertices per curve, i.e. their complexity, and exponential in $d$, $\ell$, $ε$ and $δ$, i.e., the failure probability. We achieve this result through a shortcutting lemma, which guarantees the existence of a polygonal curve with similar cost as an optimal median curve of complexity $\ell$, but of complexity at most $2\ell-2$, and whose vertices can be computed efficiently. We combine this lemma with the superset-sampling technique by Kumar et al. to derive our clustering result. In doing so, we describe and analyze a generalization of the algorithm by Ackermann et al., which may be of independent interest.