论文标题
朝向符号时间序列表示,内核密度估计器改善了
Towards Symbolic Time Series Representation Improved by Kernel Density Estimators
论文作者
论文摘要
本文涉及符号时间序列表示。它建立在流行的映射技术符号汇总近似算法(SAX)上,该算法(SAX)在序列分类,模式挖掘,异常检测,时间序列索引和其他数据挖掘任务中广泛使用。但是,这种方法的缺点是,它仅适用于具有高斯样分布的时间序列。在我们以前的工作中,我们提出了改进的萨克斯州DWSAX,可以处理高斯和非高斯数据分布。最近,我们在解决方案-EDWSAX方面取得了进一步的进步。我们的目标是通过足够的字母利用来最佳地覆盖信息空间;并尽可能满足下限标准。我们在这里描述了我们的方法,包括对常用任务的评估,例如时间序列重建误差和欧几里得距离下限,并有望改进SAX。
This paper deals with symbolic time series representation. It builds up on the popular mapping technique Symbolic Aggregate approXimation algorithm (SAX), which is extensively utilized in sequence classification, pattern mining, anomaly detection, time series indexing and other data mining tasks. However, the disadvantage of this method is, that it works reliably only for time series with Gaussian-like distribution. In our previous work we have proposed an improvement of SAX, called dwSAX, which can deal with Gaussian as well as non-Gaussian data distribution. Recently we have made further progress in our solution - edwSAX. Our goal was to optimally cover the information space by means of sufficient alphabet utilization; and to satisfy lower bounding criterion as tight as possible. We describe here our approach, including evaluation on commonly employed tasks such as time series reconstruction error and Euclidean distance lower bounding with promising improvements over SAX.