分层聚类：0.585收入近似

论文标题

分层聚类：0.585收入近似

Hierarchical Clustering: a 0.585 Revenue Approximation

论文作者

Alon, Noga, Azar, Yossi, Vainstein, Danny

论文摘要

分层聚集树已被广泛接受为聚类数据的有用形式，导致采用包括系统发育学，图像分析，生物信息学等领域的流行率。最近，Dasgupta（STOC 16'）通过近似镜头开始对这些类型的算法进行分析。后来，Moseley和Wang（NIPS 17'）认为双重问题是收入目标功能。在这个问题中，给定每个对$ i，j \ in [n] = \ {1,2，\ ldots，n \} $的非负重量$ w_ {ij} $，目的是找到一个$ t $的叶子$ [n] $最大化函数$ \ sum_ \ sum_ {i <j i <n n] - | t_ {ij} |）$，其中$ | t_ {ij} | $是扎根的子树中的叶子数，最少是$ i $和$ j $的祖先。在我们的工作中，我们考虑收入目标功能并证明以下结果。首先，我们证明存在一个分解的存在（即，深度为2的树，其中有两个孩子，每个孩子是$ n/2 $叶子的父母），该父母近似于$ \ frac {1} {1} {2} $的一般最佳树解决方案（紧密）。其次，我们应用了此结果，以证明通用收入问题的$ \ frac {2} {3} p $近似，其中$ p $定义为Max-uncut分配问题的近似值。由于已知$ p $至少为0.8776（Wu等，2015； Austrin等，2016），因此我们获得了0.585的近似值算法。这改善了一系列早期结果，最终导致了0.4246-抗焦点保证（Ahmadian等，2019）。

Hierarchical Clustering trees have been widely accepted as a useful form of clustering data, resulting in a prevalence of adopting fields including phylogenetics, image analysis, bioinformatics and more. Recently, Dasgupta (STOC 16') initiated the analysis of these types of algorithms through the lenses of approximation. Later, the dual problem was considered by Moseley and Wang (NIPS 17') dubbing it the Revenue goal function. In this problem, given a nonnegative weight $w_{ij}$ for each pair $i,j \in [n]=\{1,2, \ldots ,n\}$, the objective is to find a tree $T$ whose set of leaves is $[n]$ that maximizes the function $\sum_{i<j \in [n]} w_{ij} (n -|T_{ij}|)$, where $|T_{ij}|$ is the number of leaves in the subtree rooted at the least common ancestor of $i$ and $j$. In our work we consider the revenue goal function and prove the following results. First, we prove the existence of a bisection (i.e., a tree of depth 2 in which the root has two children, each being a parent of $n/2$ leaves) which approximates the general optimal tree solution up to a factor of $\frac{1}{2}$ (which is tight). Second, we apply this result in order to prove a $\frac{2}{3}p$ approximation for the general revenue problem, where $p$ is defined as the approximation ratio of the Max-Uncut Bisection problem. Since $p$ is known to be at least 0.8776 (Wu et al., 2015, Austrin et al., 2016), we get a 0.585 approximation algorithm for the revenue problem. This improves a sequence of earlier results which culminated in an 0.4246-approximation guarantee (Ahmadian et al., 2019).

下载PDF全文

下载文献需遵守相关版权规定

论文标题