论文标题
在未知重叠之下结合概率和非概率样本的方法
Methods for Combining Probability and Nonprobability Samples Under Unknown Overlaps
论文作者
论文摘要
越来越多地寻求非概率(便利性)样本来稳定使用有效的样本量通过随机调查(参考)样本进行的一个或多个感兴趣的人群变量的估计。从便利样本中得出的人口数量的估计通常会导致偏见,因为方便样本中感兴趣的变量的分布与人口不同。最近的一组方法通过指定参考样品加权伪可能性来估算方便样本单元的条件(在采样设计预测指标上)的包含概率。本文介绍了一种新的方法,该方法将观察到的样本的倾向得分范围得出,这是参考和便利样本的条件包含概率的函数,作为我们的主要结果。我们的方法允许指定观察到的样品的确切可能性。我们构建了贝叶斯分层公式,同时估计了方便样本单元的样本倾向得分以及条件和参考样品包含概率。在蒙特卡洛模拟研究中,我们将确切的可能性与伪可能性进行了比较。
Nonprobability (convenience) samples are increasingly sought to stabilize estimations for one or more population variables of interest that are performed using a randomized survey (reference) sample by increasing the effective sample size. Estimation of a population quantity derived from a convenience sample will typically result in bias since the distribution of variables of interest in the convenience sample is different from the population. A recent set of approaches estimates conditional (on sampling design predictors) inclusion probabilities for convenience sample units by specifying reference sample-weighted pseudo likelihoods. This paper introduces a novel approach that derives the propensity score for the observed sample as a function of conditional inclusion probabilities for the reference and convenience samples as our main result. Our approach allows specification of an exact likelihood for the observed sample. We construct a Bayesian hierarchical formulation that simultaneously estimates sample propensity scores and both conditional and reference sample inclusion probabilities for the convenience sample units. We compare our exact likelihood with the pseudo likelihoods in a Monte Carlo simulation study.