论文标题

近似运行长度编码字符串之间的动态时间翘曲距离

Approximating Dynamic Time Warping Distance Between Run-Length Encoded Strings

论文作者

Xi, Zoe, Kuszmaul, William

论文摘要

动态时间扭曲(DTW)是一种广泛使用的相似性度量,用于比较编码时间序列数据的字符串以及应用于包括生物信息学,签名验证和语音识别的领域的应用程序。 DTW的标准动态编程算法需要$ O(n^2)$时间,并且有条件的下限表明没有算法可以做得更好。 但是,在许多应用程序中,字符串$ x $和$ y $可能包含长期重复的字母,这意味着可以使用运行长度编码来压缩它们。一个自然的问题是,是否可以根据压缩字符串的长度$ k $和$ \ ell $有效地计算这些压缩字符串之间的DTW距离。最近的工作显示了如何实现$ o(k \ ell^2 + \ ell k^2)$ time,留下了一个问题,即是否可能存在近乎二次的$ \ tilde {o}(k \ ell)$ - 时间算法可能存在。 我们表明,如果允许少量近似值损失,那么确实可以使用近乎二次的时间算法:我们的算法计算$(1 +ε)$ - $ dtw(x,y)$ in $ \ tilde {o \ tilde {o}(k \ ell /ε^3)$ k $ y $ k $和$ k $ y n $ s $ n n cull in $ n n culls $ n of $ k in $ n n culls $ n n culls $ n n culls in $ k $ n n culls in $和$ \ y culls in $ n n culls,我们的算法允许在任何公制空间$(σ,δ)$上计算$ dtw $,其中距离为$ O(log(log(n))$ - 位整数。令人惊讶的是,即使$δ$不诱导$σ$的度量空间(例如,$δ$不需要满足三角形不平等),该算法也可以工作。

Dynamic Time Warping (DTW) is a widely used similarity measure for comparing strings that encode time series data, with applications to areas including bioinformatics, signature verification, and speech recognition. The standard dynamic-programming algorithm for DTW takes $O(n^2)$ time, and there are conditional lower bounds showing that no algorithm can do substantially better. In many applications, however, the strings $x$ and $y$ may contain long runs of repeated letters, meaning that they can be compressed using run-length encoding. A natural question is whether the DTW-distance between these compressed strings can be computed efficiently in terms of the lengths $k$ and $\ell$ of the compressed strings. Recent work has shown how to achieve $O(k\ell^2 + \ell k^2)$ time, leaving open the question of whether a near-quadratic $\tilde{O}(k\ell)$-time algorithm might exist. We show that, if a small approximation loss is permitted, then a near-quadratic time algorithm is indeed possible: our algorithm computes a $(1 + ε)$-approximation for $DTW(x, y)$ in $\tilde{O}(k\ell / ε^3)$ time, where $k$ and $\ell$ are the number of runs in $x$ and $y$. Our algorithm allows for $DTW$ to be computed over any metric space $(Σ, δ)$ in which distances are $O(log(n))$-bit integers. Surprisingly, the algorithm also works even if $δ$ does not induce a metric space on $Σ$ (e.g., $δ$ need not satisfy the triangle inequality).

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源