论文标题

进一步分析包含GD和GX PANGOLIN COVS的宏基因组数据集表明广泛污染,破坏了Pangolin宿主的归因

Further analysis of metagenomic datasets containing GD and GX pangolin CoVs indicates widespread contamination, undermining pangolin host attribution

论文作者

Jones, Adrian, Massey, Steven E., Zhang, Daoyu, Deigin, Yuri, Quay, Steven C.

论文摘要

据报道,据报道已被SARS-COV-2相关的冠状病毒(SARS2R-COV)感染的蝙蝠以外的唯一动物是Pangolins。在2020年初,多篇论文报告了两个SARS2R-COV的GD和GX鉴定,即感染Pangolins。但是,支持PANGOLIN基因组组装的RNA-SEQ数据集被广泛污染,包含合成矢量或大量富集或用少量但冠状病毒序列过滤。在这里,我们研究了由Li等人测序的两个Pangolin粪便样品。 (2021)为广东的GD PCOV感染提供了支持,并找到与PCR扩增子污染和SARS-COV-2污染一致的读取分布,并进一步确定了合成质粒序列的存在。我们还基于先前的工作,以进一步分析Lam等人的数据集GX/P3B。 (2020),这是Lam等人测序的唯一非富集/重滤线的Pangolin组织数据集。 (2020)。我们识别合成向量并确认数据集中的人类基因组起源样本。最后,我们在所有PANGOLIN器官数据集中发现人的线粒体序列以及小鼠和老虎线粒体序列在选定的Pangolin器官数据集中由Liu等人测序。 (2019)。我们推断,人类和小鼠基因组起源序列可能是从测序前污染的,而老虎起源序列污染可能是由于测序过程中的索引跳跃而发生的。这些观察结果对于将Pangolins归因于所检查的数据集中的SARS2R-COV主机是有问题的。可以应用此处开发和使用的法医方法来检查任何第三方SRA数据集。

The only animals other than bats reported to have been infected with SARS-CoV-2-related coronaviruses (SARS2r-CoVs) prior to the COVID-19 pandemic are pangolins. In early 2020 multiple papers reported the identification of two clades of SARS2r-CoVs, GD and GX, infecting pangolins. However the RNA-Seq datasets supporting pangolin genome assembly were widely contaminated, contained synthetic vectors or were heavily enriched or filtered with little but coronavirus sequences left in the datasets. Here we investigate two pangolin fecal samples sequenced by Li et al. (2021) provided in support of GD PCoV infection of pangolins in Guangdong and find the read distribution consistent with PCR amplicon contamination and SARS-CoV-2 contamination, and further identify the presence of synthetic plasmid sequences. We also build upon our previous work to further analyze the dataset GX/P3B by Lam et al. (2020), which is the only non enriched/heavily filtered pangolin tissue dataset sequenced by Lam et al. (2020). We identify synthetic vectors and confirm human genomic origin samples in the dataset. Finally, we find human mitochondrial sequences in all pangolin organ datasets and mouse and tiger mitochondrial sequences in selected pangolin organ datasets sequenced by Liu et al. (2019). We infer that human and mouse genomic origin sequences were probably sourced from contamination prior to sequencing, while tiger origin sequence contamination may have occurred due to index hopping during sequencing. These observations are problematic for attributing pangolins as SARS2r-CoV hosts in the datasets examined. The forensic methods developed and used here can be applied to examine any third party SRA data sets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源