论文标题
来自离散事件数据的相互作用网络通过Poisson多变量共同信息估计和信息流与基因表达数据的应用
Interaction Networks from Discrete Event Data by Poisson Multivariate Mutual Information Estimation and Information Flow with Applications from Gene Expression Data
论文作者
论文摘要
在这项工作中,我们介绍了一种新方法,用于推断泊松分布的离散值时间序列的相互作用结构。尽管大多数相关方法是在连续状态随机过程上进行的,但实际上,离散和计数事件的随机过程是自然而常见的,所谓的时间点过程(TPP)。我们在这里关注的一个重要应用是基因表达。非参数方法(例如流行的K-Nearest Neighbors(KNN))在离散过程中的融合缓慢,因此饥饿。现在,随着新的多变量泊松估计器在这里开发为核心计算引擎,因果熵(CSE)原理,以及相关的贪婪搜索算法最佳CSE(OCSE),使我们能够有效地推断以前不实用的随机过程的真实网络结构。我们说明了我们方法的功能,首先是通过合成基准进行基准测试,然后通过从乳腺癌微-RNA(miRNA)序列计数数据集来推断遗传因子网络。我们显示,泊松OCSE在Anfmatlabd的测试方法中提供了最佳性能,该方法在乳腺癌数据集上发现了先前已知的相互作用。
In this work, we introduce a new methodology for inferring the interaction structure of discrete valued time series which are Poisson distributed. While most related methods are premised on continuous state stochastic processes, in fact, discrete and counting event oriented stochastic process are natural and common, so called time-point processes (TPP). An important application that we focus on here is gene expression. Nonparameteric methods such as the popular k-nearest neighbors (KNN) are slow converging for discrete processes, and thus data hungry. Now, with the new multi-variate Poisson estimator developed here as the core computational engine, the causation entropy (CSE) principle, together with the associated greedy search algorithm optimal CSE (oCSE) allows us to efficiently infer the true network structure for this class of stochastic processes that were previously not practical. We illustrate the power of our method, first in benchmarking with synthetic datum, and then by inferring the genetic factors network from a breast cancer micro-RNA (miRNA) sequence count data set. We show the Poisson oCSE gives the best performance among the tested methods anfmatlabd discovers previously known interactions on the breast cancer data set.