论文标题

空间矩阵完成空间 - 隔离和高维空气污染数据

Spatial Matrix Completion for Spatially-Misaligned and High-Dimensional Air Pollution Data

论文作者

Vu, Phuong T., Szpiro, Adam A., Simon, Noah

论文摘要

在健康污染队列研究中,需要在新位置进行污染物浓度的准确预测,因为固定监测位点和研究参与者的位置通常在空间上不一致。对于多污染数据,通常会合并主成分分析(PCA),以在空间预测之前获得数据的低级别(LR)结构。最近开发的预测PCA通过利用数据中的LR和空间结构来修改传统算法以提高整体预测性能。但是,预测PCA需要完整的数据或初始插图步骤。非参数插补技术没有考虑空间信息可能会扭曲数据的基础结构,从而进一步降低预测性能。我们提出了一个受LR矩阵完成框架启发的凸优化问题,并开发了近端算法来解决它。丢失的数据是在算法中同时估算和处理的,这消除了单独的插图步骤的必要性。我们表明,随着丢失数据的严重程度的增加,我们的算法负担较低,并导致可靠的预测性能。

In health-pollution cohort studies, accurate predictions of pollutant concentrations at new locations are needed, since the locations of fixed monitoring sites and study participants are often spatially misaligned. For multi-pollution data, principal component analysis (PCA) is often incorporated to obtain low-rank (LR) structure of the data prior to spatial prediction. Recently developed predictive PCA modifies the traditional algorithm to improve the overall predictive performance by leveraging both LR and spatial structures within the data. However, predictive PCA requires complete data or an initial imputation step. Nonparametric imputation techniques without accounting for spatial information may distort the underlying structure of the data, and thus further reduce the predictive performance. We propose a convex optimization problem inspired by the LR matrix completion framework and develop a proximal algorithm to solve it. Missing data are imputed and handled concurrently within the algorithm, which eliminates the necessity of a separate imputation step. We show that our algorithm has low computational burden and leads to reliable predictive performance as the severity of missing data increases.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源