论文标题
重建单分子时间序列的蛋白质结构
Reconstruction of Protein Structures from Single-Molecule Time Series
论文作者
论文摘要
单分子实验技术通过记录少量的实验可观察物来跟踪分子的实时动力学。遵循这些可观察的物体提供了构象动力学的粗粒,低维的表示,但不能提供瞬时分子结构的原子表示。 Takens的延迟嵌入定理断言,在相当一般的条件下,这些低维度序列可以包含足够的信息,以重建系统的完整分子构型,直到先验未知的转换。 By combining Takens' Theorem with tools from statistical thermodynamics, manifold learning, artificial neural networks, and rigid graph theory, we establish an approach Single-molecule TAkens Reconstruction (STAR) to learn this transformation and reconstruct molecular configurations from time series in experimentally-measurable observables such as intramolecular distances accessible to single molecule Förster resonance energy transfer.我们证明了对C24H50聚合物链和人工微蛋白chignolin的分子动力学模拟应用的应用方法。受过训练的模型从合成时间序列数据中重建分子构型在尾巴到尾分子距离中的原子平均平方偏差精度优于0.2 nm。这项工作表明,可以在实验测量可观察物中准确地从时间序列中准确地重建蛋白质结构,并建立理论和算法基础,以在实际实验数据的应用中这样做。
Single-molecule experimental techniques track the real-time dynamics of molecules by recording a small number of experimental observables. Following these observables provides a coarse-grained, low-dimensional representation of the conformational dynamics but does not furnish an atomistic representation of the instantaneous molecular structure. Takens' Delay Embedding Theorem asserts that, under quite general conditions, these low-dimensional time series can contain sufficient information to reconstruct the full molecular configuration of the system up to an a priori unknown transformation. By combining Takens' Theorem with tools from statistical thermodynamics, manifold learning, artificial neural networks, and rigid graph theory, we establish an approach Single-molecule TAkens Reconstruction (STAR) to learn this transformation and reconstruct molecular configurations from time series in experimentally-measurable observables such as intramolecular distances accessible to single molecule Förster resonance energy transfer. We demonstrate the approach in applications to molecular dynamics simulations of a C24H50 polymer chain and the artificial mini-protein Chignolin. The trained models reconstruct molecular configurations from synthetic time series data in the head-to-tail molecular distances with atomistic root mean squared deviation accuracies better than 0.2 nm. This work demonstrates that it is possible to accurately reconstruct protein structures from time series in experimentally-measurable observables and establishes the theoretical and algorithmic foundations to do so in applications to real experimental data.