论文标题
图形神经网络用于微生物基因组恢复
Graph Neural Networks for Microbial Genome Recovery
论文作者
论文摘要
微生物对我们的健康和环境产生了深远的影响,但是我们对微生物群落多样性和功能的理解受到严重限制。通过微生物群落的DNA测序(宏基因组学),可以获得单个微生物的DNA片段(读取),通过组装图可以将其组合为长连续的DNA序列(重叠群)。鉴于微生物群落的复杂性,很少获得单一重对微生物基因组。取而代之的是,重叠群最终聚集在垃圾箱中,每个垃圾箱理想地构成了完整的基因组。该过程称为元基因组封装。 元基因组分子的当前最新技术仅依赖于单个重叠群的局部特征。因此,这些技术无法利用由组装图编码的重叠群之间的相似性,其中重叠群是组织的。在本文中,我们建议使用图形神经网络(GNN)在学习元基因组封装的重叠群表示时利用组装图。我们的方法VAEG-BIN结合了用于学习单个重叠群的潜在表示的变异自动编码器,以及通过考虑组件图中重叠群的邻域结构来完善这些表示形式的GNN。我们探索了几种类型的GNN,并证明VAEG-BIN比模拟和真实世界数据集中的其他最先进的杂货人恢复了高质量的基因组。
Microbes have a profound impact on our health and environment, but our understanding of the diversity and function of microbial communities is severely limited. Through DNA sequencing of microbial communities (metagenomics), DNA fragments (reads) of the individual microbes can be obtained, which through assembly graphs can be combined into long contiguous DNA sequences (contigs). Given the complexity of microbial communities, single contig microbial genomes are rarely obtained. Instead, contigs are eventually clustered into bins, with each bin ideally making up a full genome. This process is referred to as metagenomic binning. Current state-of-the-art techniques for metagenomic binning rely only on the local features for the individual contigs. These techniques therefore fail to exploit the similarities between contigs as encoded by the assembly graph, in which the contigs are organized. In this paper, we propose to use Graph Neural Networks (GNNs) to leverage the assembly graph when learning contig representations for metagenomic binning. Our method, VaeG-Bin, combines variational autoencoders for learning latent representations of the individual contigs, with GNNs for refining these representations by taking into account the neighborhood structure of the contigs in the assembly graph. We explore several types of GNNs and demonstrate that VaeG-Bin recovers more high-quality genomes than other state-of-the-art binners on both simulated and real-world datasets.