论文标题

两阶段的假设检验与FDR控制的可变相互作用

Two-stage Hypothesis Tests for Variable Interactions with FDR Control

论文作者

Duan, Jingyi, Ning, Yang, Chen, Xi, Chen, Yong

论文摘要

在许多情况下,例如全基因组关联研究,通常存在变量之间的依赖性,通常是在模型中推断相互作用效应。但是,在复杂和高维数据中,数百万变量之间的成对相互作用受到低统计功率和巨大的计算成本的影响。为了应对这些挑战,我们提出了一个具有错误发现率(FDR)控制的两阶段测试程序,这被称为不太保守的多次测试校正。从理论上讲,在两个阶段,FDR控制会费在测试统计数据之间的难度以及第二阶段进行的假设检验数量取决于第一阶段的筛选结果的事实。通过使用Cramér型中度偏差技术,我们表明我们的过程在普遍的线性模型(GLM)中渐近地控制FDR,其中允许将模型误入。另外,严格建立了FDR控制程序的渐近能力。我们通过全面的仿真研究证明,我们的两阶段程序在计算上比经典BH程序具有可比或改进的统计能力更有效。最后,我们将提出的方法应用于DBGAP的膀胱癌数据,科学目标是确定遗传易感性基因座的膀胱癌基因座。

In many scenarios such as genome-wide association studies where dependences between variables commonly exist, it is often of interest to infer the interaction effects in the model. However, testing pairwise interactions among millions of variables in complex and high-dimensional data suffers from low statistical power and huge computational cost. To address these challenges, we propose a two-stage testing procedure with false discovery rate (FDR) control, which is known as a less conservative multiple-testing correction. Theoretically, the difficulty in the FDR control dues to the data dependence among test statistics in two stages, and the fact that the number of hypothesis tests conducted in the second stage depends on the screening result in the first stage. By using the Cramér type moderate deviation technique, we show that our procedure controls FDR at the desired level asymptotically in the generalized linear model (GLM), where the model is allowed to be misspecified. In addition, the asymptotic power of the FDR control procedure is rigorously established. We demonstrate via comprehensive simulation studies that our two-stage procedure is computationally more efficient than the classical BH procedure, with a comparable or improved statistical power. Finally, we apply the proposed method to a bladder cancer data from dbGaP where the scientific goal is to identify genetic susceptibility loci for bladder cancer.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源