论文标题
用于抽样代谢网络通量空间的几何算法
Geometric algorithms for sampling the flux space of metabolic networks
论文作者
论文摘要
系统生物学是一个基本领域和范式,它引入了生物学新时代。其功能性和实用性的症结在于对生物体内发生的反应进行建模的代谢网络,并提供了了解控制生物系统的基本机制的手段。更重要的是,代谢网络具有更广泛的影响,从生态系统的分辨率到个性化医学。代谢网络的分析是一个以计算几何为导向的领域,因为它们依赖的主要操作之一是从多型中对统一的点进行取样。后者提供了代谢网络的稳态的表示。但是,由生物数据产生的多面体具有很高的尺寸(达到数千个),并且在大多数(如果不是全部)中,这些情况非常瘦。因此,要在这种情况下有效地进行统一的随机抽样,我们需要专门针对代谢网络属性量身定制的新型算法和计算框架。我们提供一个完整的软件框架来处理代谢网络中的采样。它的主链是一种多相蒙特卡洛采样(MMC)算法,该算法在一个通过时将圆形和采样统一,并在终止时获得两者。它利用了台球步行的改进变体,每步都具有更快的算术复杂性。我们通过在各种代谢网络上进行广泛的实验来证明方法的效率。值得注意的是,当今最复杂的人类代谢网络对应于尺寸5 335的多型人的采样量少于30小时。据我们所知,对于现有软件而言,这是遥不可及的。
Systems Biology is a fundamental field and paradigm that introduces a new era in Biology. The crux of its functionality and usefulness relies on metabolic networks that model the reactions occurring inside an organism and provide the means to understand the underlying mechanisms that govern biological systems. Even more, metabolic networks have a broader impact that ranges from resolution of ecosystems to personalized medicine.The analysis of metabolic networks is a computational geometry oriented field as one of the main operations they depend on is sampling uniformly points from polytopes; the latter provides a representation of the steady states of the metabolic networks. However, the polytopes that result from biological data are of very high dimension (to the order of thousands) and in most, if not all, the cases are considerably skinny. Therefore, to perform uniform random sampling efficiently in this setting, we need a novel algorithmic and computational framework specially tailored for the properties of metabolic networks.We present a complete software framework to handle sampling in metabolic networks. Its backbone is a Multiphase Monte Carlo Sampling (MMCS) algorithm that unifies rounding and sampling in one pass, obtaining both upon termination. It exploits an improved variant of the Billiard Walk that enjoys faster arithmetic complexity per step. We demonstrate the efficiency of our approach by performing extensive experiments on various metabolic networks. Notably, sampling on the most complicated human metabolic network accessible today, Recon3D, corresponding to a polytope of dimension 5 335 took less than 30 hours. To our knowledge, that is out of reach for existing software.