论文标题
GEDI:基于图的端到端数据插补框架
GEDI: A Graph-based End-to-end Data Imputation Framework
论文作者
论文摘要
数据插补是处理缺失数据的有效方法,这在实际应用中很常见。在这项研究中,我们提出并测试一个实现两个重要目标的新型数据推出过程:(1)保留观测值之间的行相似性和功能矩阵中特征之间的列及上下文关系,(2)(2)将插入过程定制为特定的下游标签预测任务。所提出的插补过程使用变压器网络和图形结构学习,以迭代地完善观察值之间特征和相似性之间的上下文关系。此外,它使用元学习框架来选择对下游预测目标影响的功能。我们对现实世界中的大数据集进行了实验,并表明所提出的插补过程始终在各种基准方法上改善插补和标签预测性能。
Data imputation is an effective way to handle missing data, which is common in practical applications. In this study, we propose and test a novel data imputation process that achieve two important goals: (1) preserve the row-wise similarities among observations and column-wise contextual relationships among features in the feature matrix, and (2) tailor the imputation process to specific downstream label prediction task. The proposed imputation process uses Transformer network and graph structure learning to iteratively refine the contextual relationships among features and similarities among observations. Moreover, it uses a meta-learning framework to select features that are influential to the downstream prediction task of interest. We conduct experiments on real-world large data sets, and show that the proposed imputation process consistently improves imputation and label prediction performance over a variety of benchmark methods.