论文标题

将组成异质性纳入马尔可夫模型以进行系统发育推断

Incorporating compositional heterogeneity into Lie Markov models for phylogenetic inference

论文作者

Hannaford, Naomi E., Heaps, Sarah E., Nye, Tom M. W., Williams, Tom A., Embley, T. Martin

论文摘要

系统发育学使用分子序列数据的比对来了解进化树。序列中的替换是通过连续时间的马尔可夫过程建模的,其特征是瞬时速率矩阵,标准模型假定时间可逆且固定。这些假设在生物学上是值得怀疑的,并且会引起可能的可能性功能,这是树的根位置不变的。这会妨碍推断,因为树的生物学解释取决于其根源的位置。放松这两个假设,我们介绍了一个模型,其可能性可以区分植根的树。该模型是非平稳的,在每个物种物种事件中,瞬时速率矩阵的步骤变化。利用最近的理论工作,每个速率矩阵都属于Markov模型的非可逆家族。这些模型在矩阵乘法下关闭,因此我们的扩展提供了概念上吸引人的特性,该属性及其所有子树可能来自同一非平稳模型家族。 我们采用贝叶斯方法,描述用于后推理的MCMC算法并提供软件。通过分析说明了我们模型可以提供的生物学见解,在该分析中,非可逆但固定和非平稳但可逆模型无法识别出合理的根。

Phylogenetics uses alignments of molecular sequence data to learn about evolutionary trees. Substitutions in sequences are modelled through a continuous-time Markov process, characterised by an instantaneous rate matrix, which standard models assume is time-reversible and stationary. These assumptions are biologically questionable and induce a likelihood function which is invariant to a tree's root position. This hampers inference because a tree's biological interpretation depends critically on where it is rooted. Relaxing both assumptions, we introduce a model whose likelihood can distinguish between rooted trees. The model is non-stationary, with step changes in the instantaneous rate matrix at each speciation event. Exploiting recent theoretical work, each rate matrix belongs to a non-reversible family of Lie Markov models. These models are closed under matrix multiplication, so our extension offers the conceptually appealing property that a tree and all its sub-trees could have arisen from the same family of non-stationary models. We adopt a Bayesian approach, describe an MCMC algorithm for posterior inference and provide software. The biological insight that our model can provide is illustrated through an analysis in which non-reversible but stationary, and non-stationary but reversible models cannot identify a plausible root.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源