论文标题
数据分割算法:单变量平均更改及以后
Data segmentation algorithms: Univariate mean change and beyond
论文作者
论文摘要
数据分割又称多个变更点分析,由于其在时间序列分析和信号处理中的重要性,在包括自然和社会科学,医学,工程和金融在内的各个领域中应用了其重要性。 在本调查的第一部分中,我们回顾了有关规范数据分割问题的现有文献,该文献旨在检测和本地化单变量时间序列中的多个变更点。我们提供了有关其计算复杂性和理论属性的流行方法的概述。特别是,我们的理论讨论集中于与给定程序可检测到的变化点有关的分离率,以及量化相应变化点估计器的精度的定位率,我们区分了均质或多尺度的观点。我们进一步强调,后一种观点为研究数据分割算法的最佳性提供了最一般的设置。 可以说,规范的分割问题是提出新的数据分割算法并研究其在过去几十年中的效率的最受欢迎的框架。在这项调查的第二部分中,我们激发了在更简单,单变量的环境中对变更点问题的优势和缺点深入了解方法的重要性,这是为更复杂问题开发方法的垫脚石。我们用一系列示例来说明这一点,展示了复杂的分布变化与均值中的连接之间的联系。我们还讨论了针对高维变化点问题的扩展,我们证明,高维度所带来的挑战对于处理多个变化点的人是正交的。
Data segmentation a.k.a. multiple change point analysis has received considerable attention due to its importance in time series analysis and signal processing, with applications in a variety of fields including natural and social sciences, medicine, engineering and finance. In the first part of this survey, we review the existing literature on the canonical data segmentation problem which aims at detecting and localising multiple change points in the mean of univariate time series. We provide an overview of popular methodologies on their computational complexity and theoretical properties. In particular, our theoretical discussion focuses on the separation rate relating to which change points are detectable by a given procedure, and the localisation rate quantifying the precision of corresponding change point estimators, and we distinguish between whether a homogeneous or multiscale viewpoint has been adopted in their derivation. We further highlight that the latter viewpoint provides the most general setting for investigating the optimality of data segmentation algorithms. Arguably, the canonical segmentation problem has been the most popular framework to propose new data segmentation algorithms and study their efficiency in the last decades. In the second part of this survey, we motivate the importance of attaining an in-depth understanding of strengths and weaknesses of methodologies for the change point problem in a simpler, univariate setting, as a stepping stone for the development of methodologies for more complex problems. We illustrate this with a range of examples showcasing the connections between complex distributional changes and those in the mean. We also discuss extensions towards high-dimensional change point problems where we demonstrate that the challenges arising from high dimensionality are orthogonal to those in dealing with multiple change points.