论文标题
在职业需求调查中归纳缺失价值
Imputing Missing Values in the Occupational Requirements Survey
论文作者
论文摘要
美国劳工统计局允许公众访问通过其职业需求调查(ORS)获取的许多数据。这些数据可用于提出有关美国劳动力中各种工作和工作类别的要求的推断。但是,数据集包含大量缺失的观测值和估计,这在某种程度上限制了其效用。在这里,我们提出了一种使用这些缺失值来利用调查数据中存在许多固有特征的方法,例如已知的人群限制以及职业与任务之间的相关性。一种迭代回归拟合,该拟合拟合使用最新版本的XGBoost实施,并在调查中报告的已知值及其标准偏差所描述的一组模拟值中执行,是用于获得每个缺失估算值的预测值分布的方法。这使我们能够以95%的置信区间计算平均预测并限制上述估计。我们讨论了我们的方法的使用以及如何利用由此产生的归精来告知和追求来自ORS收集的数据的未来研究领域。最后,我们以Wigem的概述为结论,Wigem是我们加权,迭代归档算法的广义版本,可以应用于其他上下文。
The U.S. Bureau of Labor Statistics allows public access to much of the data acquired through its Occupational Requirements Survey (ORS). This data can be used to draw inferences about the requirements of various jobs and job classes within the United States workforce. However, the dataset contains a multitude of missing observations and estimates, which somewhat limits its utility. Here, we propose a method by which to impute these missing values that leverages many of the inherent features present in the survey data, such as known population limit and correlations between occupations and tasks. An iterative regression fit, implemented with a recent version of XGBoost and executed across a set of simulated values drawn from the distribution described by the known values and their standard deviations reported in the survey, is the approach used to arrive at a distribution of predicted values for each missing estimate. This allows us to calculate a mean prediction and bound said estimate with a 95% confidence interval. We discuss the use of our method and how the resulting imputations can be utilized to inform and pursue future areas of study stemming from the data collected in the ORS. Finally, we conclude with an outline of WIGEM, a generalized version of our weighted, iterative imputation algorithm that could be applied to other contexts.