通过有监督的分割模型，过度分散的芯片播种数据中提高了峰检测精度

论文标题

通过有监督的分割模型，过度分散的芯片播种数据中提高了峰检测精度

Increased peak detection accuracy in over-dispersed ChIP-seq data with supervised segmentation models

论文作者

Liehrmann, Arnaud, Rigaill, Guillem, Hocking, Toby Dylan

论文摘要

动机：组蛋白的修饰构成了基因表达遗传调节的基本机制。在2000年代初期，一种强大的技术已经出现了，将染色质免疫沉淀与高通量测序（CHIP-SEQ）。该技术对与这些修饰相关的DNA区域进行了直接调查。为了意识到这项技术的全部潜力，已经开发或改编了日益复杂的统计算法，以分析其生成的大量数据。这些算法中的许多是围绕自然假设构建的，例如Poisson One来对计数数据中的噪声进行建模。在这项工作中，我们从这些自然的假设开始，并表明可以改善它们。结果：我们在七个组蛋白修饰的参考数据集（H3K36ME3和H3K4ME3）上进行了比较的结果表明，在应用条件下，自然假设并不总是现实的。我们表明，具有替代性噪声假设和合适的设置的无约束的多个变更点检测模型可降低计数数据所表现出的过度分散，并发现比依赖于这些自然假设的算法更准确地检测峰。

Motivation: Histone modification constitutes a basic mechanism for the genetic regulation of gene expression. In early 2000s, a powerful technique has emerged that couples chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq). This technique provides a direct survey of the DNA regions associated to these modifications. In order to realize the full potential of this technique, increasingly sophisticated statistical algorithms have been developed or adapted to analyze the massive amount of data it generates. Many of these algorithms were built around natural assumptions such as the Poisson one to model the noise in the count data. In this work we start from these natural assumptions and show that it is possible to improve upon them. Results: The results of our comparisons on seven reference datasets of histone modifications (H3K36me3 and H3K4me3) suggest that natural assumptions are not always realistic under application conditions. We show that the unconstrained multiple changepoint detection model, with alternative noise assumptions and a suitable setup, reduces the over-dispersion exhibited by count data and turns out to detect peaks more accurately than algorithms which rely on these natural assumptions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题