论文标题
将组成异质性纳入马尔可夫模型以进行系统发育推断
Incorporating compositional heterogeneity into Lie Markov models for phylogenetic inference
论文作者
论文摘要
系统发育学使用分子序列数据的比对来了解进化树。序列中的替换是通过连续时间的马尔可夫过程建模的,其特征是瞬时速率矩阵,标准模型假定时间可逆且固定。这些假设在生物学上是值得怀疑的,并且会引起可能的可能性功能,这是树的根位置不变的。这会妨碍推断,因为树的生物学解释取决于其根源的位置。放松这两个假设,我们介绍了一个模型,其可能性可以区分植根的树。该模型是非平稳的,在每个物种物种事件中,瞬时速率矩阵的步骤变化。利用最近的理论工作,每个速率矩阵都属于Markov模型的非可逆家族。这些模型在矩阵乘法下关闭,因此我们的扩展提供了概念上吸引人的特性,该属性及其所有子树可能来自同一非平稳模型家族。 我们采用贝叶斯方法,描述用于后推理的MCMC算法并提供软件。通过分析说明了我们模型可以提供的生物学见解,在该分析中,非可逆但固定和非平稳但可逆模型无法识别出合理的根。
Phylogenetics uses alignments of molecular sequence data to learn about evolutionary trees. Substitutions in sequences are modelled through a continuous-time Markov process, characterised by an instantaneous rate matrix, which standard models assume is time-reversible and stationary. These assumptions are biologically questionable and induce a likelihood function which is invariant to a tree's root position. This hampers inference because a tree's biological interpretation depends critically on where it is rooted. Relaxing both assumptions, we introduce a model whose likelihood can distinguish between rooted trees. The model is non-stationary, with step changes in the instantaneous rate matrix at each speciation event. Exploiting recent theoretical work, each rate matrix belongs to a non-reversible family of Lie Markov models. These models are closed under matrix multiplication, so our extension offers the conceptually appealing property that a tree and all its sub-trees could have arisen from the same family of non-stationary models. We adopt a Bayesian approach, describe an MCMC algorithm for posterior inference and provide software. The biological insight that our model can provide is illustrated through an analysis in which non-reversible but stationary, and non-stationary but reversible models cannot identify a plausible root.