近似于二元二进制最长的常见子序列几乎是线性时间

论文标题

近似于二元二进制最长的常见子序列几乎是线性时间

Approximating binary longest common subsequence in almost-linear time

论文作者

He, Xiaoyu, Li, Ray

论文摘要

最长的常见子序列（LCS）是一个基本的字符串相似度度量，计算两个字符串的LCS是一个经典的算法问题。教科书动态编程算法在二次时间时给出了精确的算法，在合理的细粒复杂性假设下，这绝对是最好的，因此自然问题是找到更快的近似算法。当输入是两个二进制字符串时，有一个简单的$ \ frac {1} {2} $ - 线性时间中的近似值：计算最长的通用全0或全1个子序列。即使在真正的次级时期，即使是可以实现更好的近似值，这是开放的。鲁宾斯坦和歌曲表明，在两个输入字符串相等长度的假设下，答案是肯定的。我们解决了这个问题，将其结果推广到不平等的长度字符串，证明，对于任何$ \ varepsilon> 0 $，存在$δ> 0 $和$（\ frac {1} {1} {2}+δ）$ - 近似算法的二进制LC，用于$ n^{1+ \ varepsiLon} $ n^in^in^n^n^n^n^fireps $ \ vareps $} $。由于我们的结果以及Akmal和Vassilevska-Williams的结果，对于任何$ \ varepsilon> 0 $，存在$（\ frac {1} {q}+Δ）$ - lcs of $ q $ y-q $ y-q $ n^n^{1+n^{1+v varepsilon} $的lcsials y-artimation。我们的技术建立在Guruswami，He和Li的最新工作的基础上，他们证明了可容忍删除错误的错误校正代码的新范围。它们证明了一个组合“结构引理”，用于根据其振荡模式对它们进行分类。我们证明并使用该结构引理的算法概括，这可能具有独立的利益。

The Longest Common Subsequence (LCS) is a fundamental string similarity measure, and computing the LCS of two strings is a classic algorithms question. A textbook dynamic programming algorithm gives an exact algorithm in quadratic time, and this is essentially best possible under plausible fine-grained complexity assumptions, so a natural problem is to find faster approximation algorithms. When the inputs are two binary strings, there is a simple $\frac{1}{2}$-approximation in linear time: compute the longest common all-0s or all-1s subsequence. It has been open whether a better approximation is possible even in truly subquadratic time. Rubinstein and Song showed that the answer is yes under the assumption that the two input strings have equal lengths. We settle the question, generalizing their result to unequal length strings, proving that, for any $\varepsilon>0$, there exists $δ>0$ and a $(\frac{1}{2}+δ)$-approximation algorithm for binary LCS that runs in $n^{1+\varepsilon}$ time. As a consequence of our result and a result of Akmal and Vassilevska-Williams, for any $\varepsilon>0$, there exists a $(\frac{1}{q}+δ)$-approximation for LCS over $q$-ary strings in $n^{1+\varepsilon}$ time. Our techniques build on the recent work of Guruswami, He, and Li who proved new bounds for error-correcting codes tolerating deletion errors. They prove a combinatorial "structure lemma" for strings which classifies them according to their oscillation patterns. We prove and use an algorithmic generalization of this structure lemma, which may be of independent interest.

下载PDF全文

下载文献需遵守相关版权规定

论文标题