论文标题
基于综合临床和基因组数据集的多类疾病预测
Multiclass Disease Predictions Based on Integrated Clinical and Genomics Datasets
论文作者
论文摘要
使用计算方法使用临床数据进行临床预测在生物信息学中很常见。但是,使用基因组数据集中信息的临床预测也不是研究中经常观察到的现象。精密医学研究需要所有可用数据集中的信息,以提供智能临床解决方案。在本文中,我们试图创建一个使用临床和基因组数据集信息的预测模型。我们已经使用机器学习方法证明了基于临床和基因组学数据集的组合疾病预测。我们使用临床(Clinvar)和基因组学(基因表达)数据集创建了一个集成的数据集,并使用基于实例的学习者来预测临床疾病进行了培训。我们已经使用了一种创新但简单的方法来进行多类分类,其中输出类的数量高达75。我们使用了主要组件分析进行特征选择。分类器在集成数据集上预测具有73 \%精度的疾病。与其他分类模型相比,结果是一致且胜任的。结果表明,基因组学信息可以可靠地包含在数据集中以进行临床预测,并且可以证明在临床诊断和精确医学中很有价值。
Clinical predictions using clinical data by computational methods are common in bioinformatics. However, clinical predictions using information from genomics datasets as well is not a frequently observed phenomenon in research. Precision medicine research requires information from all available datasets to provide intelligent clinical solutions. In this paper, we have attempted to create a prediction model which uses information from both clinical and genomics datasets. We have demonstrated multiclass disease predictions based on combined clinical and genomics datasets using machine learning methods. We have created an integrated dataset, using a clinical (ClinVar) and a genomics (gene expression) dataset, and trained it using instance-based learner to predict clinical diseases. We have used an innovative but simple way for multiclass classification, where the number of output classes is as high as 75. We have used Principal Component Analysis for feature selection. The classifier predicted diseases with 73\% accuracy on the integrated dataset. The results were consistent and competent when compared with other classification models. The results show that genomics information can be reliably included in datasets for clinical predictions and it can prove to be valuable in clinical diagnostics and precision medicine.