论文标题
通过降低非线性尺寸的无监督功能数据分析
Unsupervised Functional Data Analysis via Nonlinear Dimension Reduction
论文作者
论文摘要
近年来,作为降低维度的工具,多种方法已成为重点。假设高维数据实际上位于或接近低维非线性歧管上,则这些方法在几种设置中表现出令人信服的结果。对于功能数据,即代表连续观察到的函数的数据通常是合理的。但是,在功能数据的情况下,尚未系统评估用于表格或图像数据的歧管方法的性能。此外,尚不清楚如何评估无法产生可逆映射的学习嵌入的质量,因为重建误差不能用作此类表示的性能度量。在这项工作中,我们描述并调查了功能数据设置带来的非线性维度降低的具体挑战。本文的贡献是三个方面:首先,我们定义了一个理论框架,该框架允许系统地评估功能数据上下文中出现的特定挑战,将表格数据和图像数据的几种非线性降低方法传输到功能数据中,并证明在这种情况下可以成功使用歧管方法。其次,我们根据几个不同的功能数据设置对详尽而系统的评估进行绩效评估和调整策略,并指出一些以前未描述的弱点和陷阱,这可能会危及可靠的嵌入质量判断。第三,我们提出了一种细微的方法,以更加客观地对竞争不合格的嵌入做出值得信赖的决定。
In recent years, manifold methods have moved into focus as tools for dimension reduction. Assuming that the high-dimensional data actually lie on or close to a low-dimensional nonlinear manifold, these methods have shown convincing results in several settings. This manifold assumption is often reasonable for functional data, i.e., data representing continuously observed functions, as well. However, the performance of manifold methods recently proposed for tabular or image data has not been systematically assessed in the case of functional data yet. Moreover, it is unclear how to evaluate the quality of learned embeddings that do not yield invertible mappings, since the reconstruction error cannot be used as a performance measure for such representations. In this work, we describe and investigate the specific challenges for nonlinear dimension reduction posed by the functional data setting. The contributions of the paper are three-fold: First of all, we define a theoretical framework which allows to systematically assess specific challenges that arise in the functional data context, transfer several nonlinear dimension reduction methods for tabular and image data to functional data, and show that manifold methods can be used successfully in this setting. Secondly, we subject performance assessment and tuning strategies to a thorough and systematic evaluation based on several different functional data settings and point out some previously undescribed weaknesses and pitfalls which can jeopardize reliable judgment of embedding quality. Thirdly, we propose a nuanced approach to make trustworthy decisions for or against competing nonconforming embeddings more objectively.