论文标题
图像和图形卷积网络提高基于微生物组的机器学习精度
Image and graph convolution networks improve microbiome-based machine learning accuracy
论文作者
论文摘要
人肠道微生物组与大量疾病病因有关。因此,它是用于多种疾病和疾病的基于机器学习的生物标志物开发的自然候选者。通常使用16S rRNA基因测序分析微生物组。然而,微生物16S rRNA基因测序的几种特性阻碍了机器学习,包括不均匀表示,少数样品与每个样品的尺寸相比,数据的稀疏性以及数据的稀疏性,其中大多数细菌都存在于小样品子集中。我们建议两种新的方法,以结合来自不同细菌的信息并改善使用细菌分类法的机器学习的数据表示。 IMIC和GMIC分别将微生物组转换为图像和图形,然后将卷积神经网络应用于图或图像。我们表明,两种算法与最佳最新方法相比,两种算法都提高了基于静态16S rRNA基因序列的机器学习的性能。此外,这些方法可以简化分类器的解释。然后将IMIC扩展到动态微生物组样品,并提出了一种可解释的AI算法来检测与每种条件相关的细菌。
The human gut microbiome is associated with a large number of disease etiologies. As such, it is a natural candidate for machine learning based biomarker development for multiple diseases and conditions. The microbiome is often analyzed using 16S rRNA gene sequencing. However, several properties of microbial 16S rRNA gene sequencing hinder machine learning, including non-uniform representation, a small number of samples compared with the dimension of each sample, and sparsity of the data, with the majority of bacteria present in a small subset of samples. We suggest two novel methods to combine information from different bacteria and improve data representation for machine learning using bacterial taxonomy. iMic and gMic translate the microbiome to images and graphs respectively, and convolutional neural networks are then applied to the graph or image. We show that both algorithms improve performance of static 16S rRNA gene sequence-based machine learning compared to the best state-of-the-art methods. Furthermore, these methods ease the interpretation of the classifiers. iMic is then extended to dynamic microbiome samples, and an iMic explainable AI algorithm is proposed to detect bacteria relevant to each condition.