论文标题

横跨图中蛋白质的图鉴定(Grip-Tomo)

Graph identification of proteins in tomograms (GRIP-Tomo)

论文作者

George, August, Kim, Doo Nam, Moser, Trevor, Gildea, Ian T., Evans, James E., Cheung, Margaret S.

论文摘要

在这项研究中,我们提出了一种基于网络理论的模式开采方法,该方法可以鉴定合成体积密度的蛋白质结构或复合物,而无需了解预定义的模板或人类偏见以进行细化。我们假设蛋白质结构的拓扑连接性是不变的,并且它们是从体积密度中呈现的扭曲数据中识别蛋白质鉴定的目的。蛋白质或模拟断层扫描量的复合物的三维密度被转化为数学图,作为可观察到的。我们系统地引入了数据失真或缺陷,例如数据的填充度缺失,翻滚效果和缺失的楔形效果在模拟的体积中,并在像素中捕获了像素中的距离截止,以捕获缺陷的密度簇质心之间的不同连通性。通过比较其网络理论顺序参数,包括节点度,中间性和图密度,可以计算出来自模拟体积的图与从物理蛋白结构中转换的图之间的相似性得分。通过捕获定义网络异质形态的基本拓扑特征,我们能够准确地从十个拓扑具有独特的样品中准确鉴定蛋白质和同型物质复合物,而没有添加逼真的噪声。我们的方法通过提供模式挖掘的方式来赋予未来的发展过程的发展,以使单域蛋白质本地拓扑以及来自多聚体复合物的独特单域蛋白质的分类。

In this study, we present a method of pattern mining based on network theory that enables the identification of protein structures or complexes from synthetic volume densities, without the knowledge of predefined templates or human biases for refinement. We hypothesized that the topological connectivity of protein structures is invariant, and they are distinctive for the purpose of protein identification from distorted data presented in volume densities. Three-dimensional densities of a protein or a complex from simulated tomographic volumes were transformed into mathematical graphs as observables. We systematically introduced data distortion or defects such as missing fullness of data, the tumbling effect, and the missing wedge effect into the simulated volumes, and varied the distance cutoffs in pixels to capture the varying connectivity between the density cluster centroids in the presence of defects. A similarity score between the graphs from the simulated volumes and the graphs transformed from the physical protein structures in point data was calculated by comparing their network theory order parameters including node degrees, betweenness centrality, and graph densities. By capturing the essential topological features defining the heterogeneous morphologies of a network, we were able to accurately identify proteins and homo-multimeric complexes from ten topologically distinctive samples without realistic noise added. Our approach empowers future developments of tomogram processing by providing pattern mining with interpretability, to enable the classification of single-domain protein native topologies as well as distinct single-domain proteins from multimeric complexes within noisy volumes.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源