论文标题
时间序列的弹性产品量化
Elastic Product Quantization for Time Series
论文作者
论文摘要
由于存储成本高和计算要求,在实践中很难分析众多或长时间的序列。因此,已经提出了技术来生成时间序列的紧凑相似性保护表示,从而在大型内存数据收集中实现实时相似性搜索。但是,现有技术不适合评估序列在本地不相同时的相似性。在本文中,我们建议将产品量化用于有效的相似性比较时间循环下的时间序列。这个想法是首先通过将时间序列划分为相等长度的子序列来压缩数据,该长度由简短代码表示。然后可以通过其代码之间的预计弹性距离有效地近似两个时间序列之间的距离。分区分为子序列力不需要的对准,我们使用最大重叠离散小波变换(MODWT)的前对准步骤来解决。为了证明我们方法的效率和准确性,我们在最近的邻居分类和聚类应用程序中对基准数据集进行了广泛的实验评估。总体而言,提出的解决方案是在时间序列应用中的弹性度量替代的高效(在记忆使用时间和计算时间方面)。
Analyzing numerous or long time series is difficult in practice due to the high storage costs and computational requirements. Therefore, techniques have been proposed to generate compact similarity-preserving representations of time series, enabling real-time similarity search on large in-memory data collections. However, the existing techniques are not ideally suited for assessing similarity when sequences are locally out of phase. In this paper, we propose the use of product quantization for efficient similarity-based comparison of time series under time warping. The idea is to first compress the data by partitioning the time series into equal length sub-sequences which are represented by a short code. The distance between two time series can then be efficiently approximated by pre-computed elastic distances between their codes. The partitioning into sub-sequences forces unwanted alignments, which we address with a pre-alignment step using the maximal overlap discrete wavelet transform (MODWT). To demonstrate the efficiency and accuracy of our method, we perform an extensive experimental evaluation on benchmark datasets in nearest neighbors classification and clustering applications. Overall, the proposed solution emerges as a highly efficient (both in terms of memory usage and computation time) replacement for elastic measures in time series applications.