GPU中神经网络的可靠性评估：永久性故障的框架

论文标题

GPU中神经网络的可靠性评估：永久性故障的框架

Reliability Assessment of Neural Networks in GPUs: A Framework For Permanent Faults Injections

论文作者

Guerrero-Balaguera, Juan-David, Galasso, Luigi, Sierra, Robert Limas, Reorda, Matteo Sonza

论文摘要

当前，深度学习，尤其是卷积神经网络（CNN）已成为一种基本的计算方法，该方法应用于广泛的领域，包括一些安全至关重要的应用（例如，汽车，机器人技术和医疗设备）。因此，这些计算系统的可靠性评估是必须的。从应用程序级别到硬件级别，通过不同级别的抽象级别的故障注入活动来执行CNN的可靠性评估。许多作品都集中于在存在瞬态故障的情况下评估神经网络的可靠性。但是，仅在应用级别上研究了永久性故障的影响，例如针对网络参数。本文打算提出一个框架，诉诸于执行故障注入活动的二进制仪器工具，针对GPU内部的不同组件，例如寄存器文件和功能单元。这种环境首次允许考虑存在永久性故障的GPU上部署的CNN的可靠性。

Currently, Deep learning and especially Convolutional Neural Networks (CNNs) have become a fundamental computational approach applied in a wide range of domains, including some safety-critical applications (e.g., automotive, robotics, and healthcare equipment). Therefore, the reliability evaluation of those computational systems is mandatory. The reliability evaluation of CNNs is performed by fault injection campaigns at different levels of abstraction, from the application level down to the hardware level. Many works have focused on evaluating the reliability of neural networks in the presence of transient faults. However, the effects of permanent faults have been investigated at the application level, only, e.g., targeting the parameters of the network. This paper intends to propose a framework, resorting to a binary instrumentation tool to perform fault injection campaigns, targeting different components inside the GPU, such as the register files and the functional units. This environment allows for the first time assessing the reliability of CNNs deployed on a GPU considering the presence of permanent faults.

下载PDF全文

下载文献需遵守相关版权规定

论文标题