论文标题
现代数据密集型应用程序的异质性数据以数据为中心的架构:机器学习和数据库中的案例研究
Heterogeneous Data-Centric Architectures for Modern Data-Intensive Applications: Case Studies in Machine Learning and Databases
论文作者
论文摘要
当今的计算系统需要在计算资源(例如CPU,GPU,ACCELERATOR)和芯片主内存之间来回移动数据,以便在数据上进行计算。不幸的是,这种数据移动是系统性能和能耗的主要瓶颈。一种有希望的执行范式可以减轻现代和新兴应用程序中数据运动瓶颈的范式是内存处理(PIM),其中通过在内存附近放置计算功能来降低到/从主内存的数据移动成本。 由于许多设计约束PIM基板施加的许多设计限制,因此天真地使用PIM加速数据密集型工作负载会导致次优性能。因此,许多最近的作品共同设计了专业的PIM加速器和算法,以提高性能并减少(i)来自各个应用领域的应用的能源消耗; (ii)各种计算环境,包括云系统,移动系统和边缘设备。 我们以两个现代数据密集型应用程序有效利用PIM范式的方式来展示共同设计算法和硬件的好处:(1)边缘设备的机器学习推理模型以及(2)混合交易/分析/分析处理数据库的云系统的混合交易/分析处理数据库。我们在系统设计中遵循两步方法。在第一步中,我们广泛分析了每个应用程序的计算和内存访问模式,以洞悉其硬件/软件要求以及以处理器为中心系统中的性能和能源瓶颈的主要来源。在第二步中,我们利用从第一步到共同设计算法和硬件加速器的见解来为每个应用程序启用高性能和节能以数据为中心的体系结构。
Today's computing systems require moving data back-and-forth between computing resources (e.g., CPUs, GPUs, accelerators) and off-chip main memory so that computation can take place on the data. Unfortunately, this data movement is a major bottleneck for system performance and energy consumption. One promising execution paradigm that alleviates the data movement bottleneck in modern and emerging applications is processing-in-memory (PIM), where the cost of data movement to/from main memory is reduced by placing computation capabilities close to memory. Naively employing PIM to accelerate data-intensive workloads can lead to sub-optimal performance due to the many design constraints PIM substrates impose. Therefore, many recent works co-design specialized PIM accelerators and algorithms to improve performance and reduce the energy consumption of (i) applications from various application domains; and (ii) various computing environments, including cloud systems, mobile systems, and edge devices. We showcase the benefits of co-designing algorithms and hardware in a way that efficiently takes advantage of the PIM paradigm for two modern data-intensive applications: (1) machine learning inference models for edge devices and (2) hybrid transactional/analytical processing databases for cloud systems. We follow a two-step approach in our system design. In the first step, we extensively analyze the computation and memory access patterns of each application to gain insights into its hardware/software requirements and major sources of performance and energy bottlenecks in processor-centric systems. In the second step, we leverage the insights from the first step to co-design algorithms and hardware accelerators to enable high-performance and energy-efficient data-centric architectures for each application.