论文标题
GenPIP:通过紧密整合基本和读取映射,基因组分析的内存加速度
GenPIP: In-Memory Acceleration of Genome Analysis via Tight Integration of Basecalling and Read Mapping
论文作者
论文摘要
纳米孔测序是一种广泛使用的高通量基因组测序技术,可以以低成本将基因组的长片段序列为原始电信号。纳米孔测序需要两个计算上的处理步骤,以进行准确的下游基因组分析。基本的第一步将原始电信号转换为核苷酸碱基(即A,C,G,T)。第二步,阅读映射,在参考基因组中找到读取的正确位置。在现有的基因组分析管道中,基本绘制和读取映射将单独执行。我们在这项工作中观察到,这两个最耗时的步骤的这种单独执行本质上会导致(1)重要的数据运动和(2)数据上的冗余计算,从而减慢了基因组分析管道。本文提出了GenPIP,这是一种内存基因组分析加速器,它紧密整合了基本和读取映射。 GenPIP通过两种关键机制提高了基因组分析管道的性能:(1)并行的主要基因组分析步骤中内存的细粒度协作执行; (2)一种用于早期拒绝低质量和未绘制的读数的新技术,可以及时停止对此类读取的基因组分析的执行,从而降低效率低下的计算。我们的实验表明,对于基因组分析管道的执行,GenPIP提供了41.6倍(8.4倍)的加速和32.8倍(20.8倍)的能量节省,而精确的准确性损失与最先进的软件基因组分析工具相比,具有可忽略的准确性损失。与结合了最新的内存基底座和读取映射加速器的设计相比,GenPIP提供1.39倍的速度和1.37倍的能量节省。
Nanopore sequencing is a widely-used high-throughput genome sequencing technology that can sequence long fragments of a genome into raw electrical signals at low cost. Nanopore sequencing requires two computationally-costly processing steps for accurate downstream genome analysis. The first step, basecalling, translates the raw electrical signals into nucleotide bases (i.e., A, C, G, T). The second step, read mapping, finds the correct location of a read in a reference genome. In existing genome analysis pipelines, basecalling and read mapping are executed separately. We observe in this work that such separate execution of the two most time-consuming steps inherently leads to (1) significant data movement and (2) redundant computations on the data, slowing down the genome analysis pipeline. This paper proposes GenPIP, an in-memory genome analysis accelerator that tightly integrates basecalling and read mapping. GenPIP improves the performance of the genome analysis pipeline with two key mechanisms: (1) in-memory fine-grained collaborative execution of the major genome analysis steps in parallel; (2) a new technique for early-rejection of low-quality and unmapped reads to timely stop the execution of genome analysis for such reads, reducing inefficient computation. Our experiments show that, for the execution of the genome analysis pipeline, GenPIP provides 41.6X (8.4X) speedup and 32.8X (20.8X) energy savings with negligible accuracy loss compared to the state-of-the-art software genome analysis tools executed on a state-of-the-art CPU (GPU). Compared to a design that combines state-of-the-art in-memory basecalling and read mapping accelerators, GenPIP provides 1.39X speedup and 1.37X energy savings.