论文标题

社会科学的高维归类:最先进方法的比较

High-dimensional Imputation for the Social Sciences: a Comparison of State-of-the-art Methods

论文作者

Costantini, Edoardo, Lang, Kyle M., Reeskens, Tim, Sijtsma, Klaas

论文摘要

在多个插补程序(MI)程序的基础模型中包括大量预测指标是助理面临的最具挑战性的任务之一。各种高维MI技术可以帮助您,但是对它们的相对性能的研究有限。在这项研究中,我们研究了广泛的现存高维MI技术,这些技术可以在插补模型和一般缺失的数据模式中处理大量预测因子。我们通过蒙特卡洛仿真研究评估了七种高维MI方法的相对性能,并根据实际调查数据进行了重采样研究。这些方法的性能由它们促进完整数据分析模型参数的无偏见和信心估计的程度定义。我们发现,使用Lasso惩罚或正向选择选择MI模型中使用的预测指标,并使用主组件分析来降低辅助数据的维度可产生最佳结果。

Including a large number of predictors in the imputation model underlying a multiple imputation (MI) procedure is one of the most challenging tasks imputers face. A variety of high-dimensional MI techniques can help, but there has been limited research on their relative performance. In this study, we investigated a wide range of extant high-dimensional MI techniques that can handle a large number of predictors in the imputation models and general missing data patterns. We assessed the relative performance of seven high-dimensional MI methods with a Monte Carlo simulation study and a resampling study based on real survey data. The performance of the methods was defined by the degree to which they facilitate unbiased and confidencevalid estimates of the parameters of complete data analysis models. We found that using lasso penalty or forward selection to select the predictors used in the MI model and using principal component analysis to reduce the dimensionality of auxiliary data produce the best results.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源