论文标题

用于计算和嵌入间隙编辑距离的司额时间算法

Sublinear-Time Algorithms for Computing & Embedding Gap Edit Distance

论文作者

Kociumaka, Tomasz, Saha, Barna

论文摘要

在本文中,我们设计了用于解决差距编辑距离问题并嵌入编辑距离到锤击距离的新的均等时间算法。对于差距编辑距离问题,我们给出了$ \ tilde {o}(\ frac {n} {k}+k^2)$ - 时间贪婪算法,该算法区分长度 - $ n $ n $输入条带有编辑距离的最多$ k $和具有编辑距离的$ k $的$(3K+5)k $。这是对Goldenberg,Krauthgamer和Saha [focs 2019]结果的改进和简化,其中$ k $ vs $θ(k^2)$差距编辑距离问题在$ \ tilde {o}中解决了(\ frac {n} {n} {k} {k}+k^3)$ time。我们进一步概括了我们的结果,以解决$ k $ vs $ k'$差距编辑距离问题$ \ tilde {o}(\ frac {nk} {k'}+ k^2+ k^2+ \ frac {k^2} {k'} {k'} {k'} \ sqrt {nk}) $ \ tilde {o}(\ frac {nk} {k'}+k^3)$。最后,我们表明,如果输入字符串没有很长的高周期子字符串,那么$ k $ vs $(1+ε)k $ gap编辑距离问题可以在sublinear时间内解决。具体而言,如果字符串最多包含$ 2K $的长度$ \ ell $的子字符串,那么我们实现的运行时间为$ \ tilde {o}(\ frac {n} {ε^2 k}+k}+k^2 \ ell)$。 我们进一步给出了第一个均值时间概率嵌入到锤距的距离。对于任何参数$ p $,我们的$ \ tilde {o}(\ frac {n} {p})$ - 时间过程可以产生带有失真$ O(kp)$的嵌入,其中$ k $是原始字符串的编辑距离。具体而言,所得字符串的锤量距离为$ \ frac {k-p+1} {p+1} $和$ o(k^2)$,概率很好。这概括了Chakraborty,Goldenberg和Koucký[STOC 2016]的线性时间嵌入,在那里,由此产生的Hamming距离在$ \ frac K2 $和$ O(k^2)$之间。我们的算法基于对样品的随机步行,我们相信该样品将在sublinear-time算法中找到其他应用程序。

In this paper, we design new sublinear-time algorithms for solving the gap edit distance problem and for embedding edit distance to Hamming distance. For the gap edit distance problem, we give an $\tilde{O}(\frac{n}{k}+k^2)$-time greedy algorithm that distinguishes between length-$n$ input strings with edit distance at most $k$ and those with edit distance exceeding $(3k+5)k$. This is an improvement and a simplification upon the result of Goldenberg, Krauthgamer, and Saha [FOCS 2019], where the $k$ vs $Θ(k^2)$ gap edit distance problem is solved in $\tilde{O}(\frac{n}{k}+k^3)$ time. We further generalize our result to solve the $k$ vs $k'$ gap edit distance problem in time $\tilde{O}(\frac{nk}{k'}+k^2+ \frac{k^2}{k'}\sqrt{nk})$, strictly improving upon the previously known bound $\tilde{O}(\frac{nk}{k'}+k^3)$. Finally, we show that if the input strings do not have long highly periodic substrings, then already the $k$ vs $(1+ε)k$ gap edit distance problem can be solved in sublinear time. Specifically, if the strings contain no substring of length $\ell$ with period at most $2k$, then the running time we achieve is $\tilde{O}(\frac{n}{ε^2 k}+k^2\ell)$. We further give the first sublinear-time probabilistic embedding of edit distance to Hamming distance. For any parameter $p$, our $\tilde{O}(\frac{n}{p})$-time procedure yields an embedding with distortion $O(kp)$, where $k$ is the edit distance of the original strings. Specifically, the Hamming distance of the resultant strings is between $\frac{k-p+1}{p+1}$ and $O(k^2)$ with good probability. This generalizes the linear-time embedding of Chakraborty, Goldenberg, and Koucký [STOC 2016], where the resultant Hamming distance is between $\frac k2$ and $O(k^2)$. Our algorithm is based on a random walk over samples, which we believe will find other applications in sublinear-time algorithms.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源