论文标题
贝叶斯结构学习,用于计数数据的参数边缘:对微生物群的应用
Bayesian Structural Learning with Parametric Marginals for Count Data: An Application to Microbiota Systems
论文作者
论文摘要
高维和异质数数据在各种应用领域中收集。在本文中,我们仔细研究了有关微生物组的高分辨率测序数据,这些数据使研究人员能够研究整个微生物群落的基因组。揭示这些社区之间的潜在互动对于学习微生物如何影响人类健康至关重要。为了从类似这些数据的多元计数数据中进行结构学习,我们开发了一个具有两个关键元素的新型高斯模型图形模型。首先,我们采用参数回归来表征边际分布。此步骤对于适应外部协变量的影响至关重要。忽略这种调整可能会在推断基础依赖网络的推断中引起扭曲。其次,我们基于适用于高维度的计算高效搜索算法的贝叶斯结构学习框架。该方法返回边缘效应和依赖性结构的同时推断,包括图形不确定性估计。一项模拟研究和微生物组数据的真实数据分析突出了所提出的方法从多元计数数据中推断网络的适用性,尤其是与微生物组分析的相关性。提出的方法在R软件包BDGraph中实现。
High dimensional and heterogeneous count data are collected in various applied fields. In this paper, we look closely at high-resolution sequencing data on the microbiome, which have enabled researchers to study the genomes of entire microbial communities. Revealing the underlying interactions between these communities is of vital importance to learn how microbes influence human health. To perform structural learning from multivariate count data such as these, we develop a novel Gaussian copula graphical model with two key elements. Firstly, we employ parametric regression to characterize the marginal distributions. This step is crucial for accommodating the impact of external covariates. Neglecting this adjustment could potentially introduce distortions in the inference of the underlying network of dependences. Secondly, we advance a Bayesian structure learning framework, based on a computationally efficient search algorithm that is suited to high dimensionality. The approach returns simultaneous inference of the marginal effects and of the dependence structure, including graph uncertainty estimates. A simulation study and a real data analysis of microbiome data highlight the applicability of the proposed approach at inferring networks from multivariate count data in general, and its relevance to microbiome analyses in particular. The proposed method is implemented in the R package BDgraph.