论文标题
表型性状进化模型中的稀疏精度矩阵估计
Sparse precision matrix estimation in phenotypic trait evolution models
论文作者
论文摘要
系统发育性状进化模型允许估计在相关生物样本中观察到的一组特征之间的进化相关性。通过直接对特征沿着可估计的系统发育树的演变进行建模,该模型的结构有效地控制了共享进化史。在这些模型中,通常通过其边际分布的高后密度间隔来评估相关的相关性。但是,仅选择的相关性可能无法提供有关性状关系的完整图。它们通过编码部分相关性的图表示的它们的结构可以对比,以相比之下突出显示特征性状之间直接关联的稀疏模式。为了开发一种基于模型的方法来识别这种关联结构,我们探讨了高斯图形模型(GGM)用于协方差选择的使用。我们用G-Wishart共轭先验对精度矩阵进行建模,从而导致精确估计。此外,该模型自然允许贝叶斯因子在特征之间进行关联的测试,而无需进行其他计算。我们通过蒙特卡洛模拟和应用来评估我们的方法,这些模拟和应用在达尔文的雀科中表型性状的结构结构和进化相关性以及原核生物中基因组和表型性状的相关性。我们的方法为精确和相关参数估计值提供了准确的图估计和较低的误差,特别是对于有条件独立的特征,这是GGMS中稀疏性的目标。
Phylogenetic trait evolution models allow for the estimation of evolutionary correlations between a set of traits observed in a sample of related organisms. By directly modeling the evolution of the traits along an estimable phylogenetic tree, the model's structure effectively controls for shared evolutionary history. In these models, relevant correlations are usually assessed through the high posterior density interval of their marginal distributions. However, the selected correlations alone may not provide the full picture regarding trait relationships. Their association structure, expressed through a graph that encodes partial correlations, can in contrast highlight sparsity patterns featuring direct associations between traits. In order to develop a model-based method to identify this association structure we explore the use of Gaussian graphical models (GGM) for covariance selection. We model the precision matrix with a G-Wishart conjugate prior, which results in sparse precision estimates. Furthermore the model naturally allows for Bayes Factor tests of association between the traits, with no additional computation required. We evaluate our approach through Monte Carlo simulations and applications that examine the association structure and evolutionary correlations of phenotypic traits in Darwin's finches and genomic and phenotypic traits in prokaryotes. Our approach provides accurate graph estimates and lower errors for the precision and correlation parameter estimates, particularly for conditionally independent traits, which are the target for sparsity in GGMs.