论文标题
Ednaplus:基于DNA的生物多样性监测的统一建模框架
eDNAPlus: A unifying modelling framework for DNA-based biodiversity monitoring
论文作者
论文摘要
基于DNA的生物多样性调查涉及从调查地点收集物理样品,并通过其诊断DNA序列来检测实验室中的内容。基于DNA的调查越来越多地用于生物多样性监测。最常用的方法是metabarcoding,它将PCR与高通量DNA测序结合起来,以放大,然后读取“ DNA条形码”序列。此过程生成计数数据,指示每个DNA条形码读取的次数。但是,基于DNA的数据嘈杂且容易出错,具有多种差异来源。在本文中,我们为基于DNA的数据提供了一个统一的建模框架,允许在生成数据中的所有关键差异和错误来源。该模型可以估计各个站点之间的物种内生物量变化,并将这些变化与环境协变量联系起来,同时考虑物种和地点相关性。推理是使用MCMC进行的,在该MCMC中,我们使用Laplace近似值使用Gibbs或Metropolis-Hastings更新。我们还实施了一种适用于交叉效应模型的重新参数方案,从而改善了混合,以及一种自适应方法来更新潜在变量,从而减少了计算时间。我们讨论研究设计并提出理论和仿真结果,以指导在不同阶段以及使用质量控制方法的复制决策。我们在不适陷阱样品数据集上演示了新框架。我们量化了高程和距离对每个物种的影响,推断物种相关性,并产生识别高生物多样性区域的地图,这些区域可用于通过保护价值对区域进行排名。我们估计位点之间和样品重复内部之间的噪声水平,以及在PCR阶段的错误概率,对于大多数被考虑的物种而言,它们接近零,从而验证了所采用的实验室加工。
DNA-based biodiversity surveys involve collecting physical samples from survey sites and assaying the contents in the laboratory to detect species via their diagnostic DNA sequences. DNA-based surveys are increasingly being adopted for biodiversity monitoring. The most commonly employed method is metabarcoding, which combines PCR with high-throughput DNA sequencing to amplify and then read `DNA barcode' sequences. This process generates count data indicating the number of times each DNA barcode was read. However, DNA-based data are noisy and error-prone, with several sources of variation. In this paper, we present a unifying modelling framework for DNA-based data allowing for all key sources of variation and error in the data-generating process. The model can estimate within-species biomass changes across sites and link those changes to environmental covariates, while accounting for species and sites correlation. Inference is performed using MCMC, where we employ Gibbs or Metropolis-Hastings updates with Laplace approximations. We also implement a re-parameterisation scheme, appropriate for crossed-effects models, leading to improved mixing, and an adaptive approach for updating latent variables, reducing computation time. We discuss study design and present theoretical and simulation results to guide decisions on replication at different stages and on the use of quality control methods. We demonstrate the new framework on a dataset of Malaise-trap samples. We quantify the effects of elevation and distance-to-road on each species, infer species correlations, and produce maps identifying areas of high biodiversity, which can be used to rank areas by conservation value. We estimate the level of noise between sites and within sample replicates, and the probabilities of error at the PCR stage, which are close to zero for most species considered, validating the employed laboratory processing.