论文标题
对蛋白质磷酸化位点的机器学习和算法方法的综述预测
A Review of Machine Learning and Algorithmic Methods for Protein Phosphorylation Sites Prediction
论文作者
论文摘要
翻译后修饰(PTMS)在扩展蛋白质的功能多样性中具有关键作用,因此,调节原核和真核生物的各种细胞过程。磷酸化修饰是大多数蛋白质中发生的重要PTM,并且在许多生物过程中起着重要作用。磷酸化过程中的疾病导致多种疾病,包括神经系统疾病和癌症。这篇审查论文的目的是组织与磷酸化位点(P站点)预测相关的知识体系,以促进该领域的未来研究。首先,我们全面审查了所有相关数据库,并介绍了有关数据集创建,数据预处理和方法评估的所有步骤。接下来,我们研究了P-Site预测方法,这些方法属于两个计算组:算法和机器学习(ML)。此外,结果表明,ML:常规和端到端的深度学习方法基本上有两种主要方法来预测P-SITE的预测,这两个方法均已为两者提供了概述。此外,这项研究介绍了最重要的特征提取技术,这些技术主要用于P站点预测。最后,我们创建了与基于一般和人类的DBPTM数据库的2022th版本有关的新蛋白质的三个测试集。对测试集中可用的在线工具的评估显示,P-Site预测的性能较差。关键词:磷酸化,机器学习,深度学习,邮政翻译修改,数据库
Post-translational modifications (PTMs) have key roles in extending the functional diversity of proteins and as a result, regulating diverse cellular processes in prokaryotic and eukaryotic organisms. Phosphorylation modification is a vital PTM that occurs in most proteins and plays a significant role in many biological processes. Disorders in the phosphorylation process lead to multiple diseases including neurological disorders and cancers. The purpose of this review paper is to organize this body of knowledge associated with phosphorylation site (p-site) prediction to facilitate future research in this field. At first, we comprehensively reviewed all related databases and introduced all steps regarding dataset creation, data preprocessing and method evaluation in p-site prediction. Next, we investigated p-sites prediction methods which fall into two computational groups: Algorithmic and Machine Learning (ML). Additionally, it was shown that there are basically two main approaches for p-sites prediction by ML: conventional and End-to-End deep learning methods, which were given an overview for both of them. Moreover, this study introduced the most important feature extraction techniques which have mostly been used in p-site prediction. Finally, we created three test sets from new proteins related to the 2022th released version of the dbPTM database based on general and human species. Evaluation of the available online tools on the test sets showed quite poor performance for p-sites prediction. Keywords: Phosphorylation, Machine Learning, Deep Learning, Post Translation Modification, Databases