论文标题
在锤子和编辑距离下的K型隔层
k-Approximate Quasiperiodicity under Hamming and Edit Distance
论文作者
论文摘要
大约30年前,引入了字符串中的Quasiperiodicity,作为弦周期性的扩展。准碘的基本概念是覆盖物和种子。文本$ t $的封面是一条字符串,其出现在$ t $中的所有职位$ t $。文字$ t $的种子是$ t $的超重封面。在各种应用中,由于存在误差,精确的quasiperiodicity仍然不够。我们考虑了Quasiperiodicity的大概概念,为此,我们允许在$ t $中大约发生,并带有小锤子,Levenshtein或加权编辑距离。 在先前的工作中,SIP等。 (2002)和Christodoulakis等。 (2005年)表明,在加权编辑距离下,计算近似覆盖物和种子是NP-HARD。因此,他们认为需要限制的近似覆盖物和种子,这些覆盖物和种子需要是原始字符串$ t $的因素,并提出了用于计算它们的多项式时间算法。 Guth等人在几项贡献中给出了其他算法,考虑到限制了$ k $的大概发生的算法。他们还研究了轻松的近似Quasiperiods,不需要覆盖$ t $的所有职位。 如果有大数据,多项式时间复杂性中的指数起着至关重要的作用。我们提出了更有效的算法来计算限制近似覆盖物和种子。特别是,我们改善了许多上述算法的复杂性,也可以改善放松的准环境。如果允许错误的数量(或总成本)有限,我们的解决方案特别有效。我们还显示了在锤距离下计算非限制性近似覆盖物和种子的NP硬度。 在过去三年中,CPM最近的三项贡献中研究了大约覆盖。但是,这些作品考虑了$ t $的大约封面的不同定义。
Quasiperiodicity in strings was introduced almost 30 years ago as an extension of string periodicity. The basic notions of quasiperiodicity are cover and seed. A cover of a text $T$ is a string whose occurrences in $T$ cover all positions of $T$. A seed of text $T$ is a cover of a superstring of $T$. In various applications exact quasiperiodicity is still not sufficient due to the presence of errors. We consider approximate notions of quasiperiodicity, for which we allow approximate occurrences in $T$ with a small Hamming, Levenshtein or weighted edit distance. In previous work Sip et al. (2002) and Christodoulakis et al. (2005) showed that computing approximate covers and seeds, respectively, under weighted edit distance is NP-hard. They, therefore, considered restricted approximate covers and seeds which need to be factors of the original string $T$ and presented polynomial-time algorithms for computing them. Further algorithms, considering approximate occurrences with Hamming distance bounded by $k$, were given in several contributions by Guth et al. They also studied relaxed approximate quasiperiods that do not need to cover all positions of $T$. In case of large data the exponents in polynomial time complexity play a crucial role. We present more efficient algorithms for computing restricted approximate covers and seeds. In particular, we improve upon the complexities of many of the aforementioned algorithms, also for relaxed quasiperiods. Our solutions are especially efficient if the number (or total cost) of allowed errors is bounded. We also show NP-hardness of computing non-restricted approximate covers and seeds under Hamming distance. Approximate covers were studied in three recent contributions at CPM over the last three years. However, these works consider a different definition of an approximate cover of $T$.