论文标题
估计和推理代理数据及其遗传应用
Estimation and Inference with Proxy Data and its Genetic Applications
论文作者
论文摘要
现有的高维统计方法在很大程度上是用于分析个体级别数据的。在这项工作中,我们研究了高维线性模型的估计和推断,其中我们只观察到“代理数据”,其中包括基于不同个体集合的边际统计和样本协方差矩阵。我们基于代理数据开发了一种用于回归系数向量及其线性功能的估计和推断的最佳方法。此外,我们显示了基于代理数据的推断中的固有局限性:在观察单个数据的常规情况下,估计的最小值最佳速率要慢;随着信号强度为无穷大,测试和多次测试的功率不会进行。这些有趣的发现是通过模拟研究和对小鼠种群中后肢肌肉体重的遗传关联的数据集进行分析来说明的。
Existing high-dimensional statistical methods are largely established for analyzing individual-level data. In this work, we study estimation and inference for high-dimensional linear models where we only observe "proxy data", which include the marginal statistics and sample covariance matrix that are computed based on different sets of individuals. We develop a rate optimal method for estimation and inference for the regression coefficient vector and its linear functionals based on the proxy data. Moreover, we show the intrinsic limitations in the proxy-data based inference: the minimax optimal rate for estimation is slower than that in the conventional case where individual data are observed; the power for testing and multiple testing does not go to one as the signal strength goes to infinity. These interesting findings are illustrated through simulation studies and an analysis of a dataset concerning the genetic associations of hindlimb muscle weight in a mouse population.