基于CNN的多阶段门控平均融合（MGAF），用于使用深度和惯性传感器的人类动作识别

论文标题

基于CNN的多阶段门控平均融合（MGAF），用于使用深度和惯性传感器的人类动作识别

CNN based Multistage Gated Average Fusion (MGAF) for Human Action Recognition Using Depth and Inertial Sensors

论文作者

Ahmad, Zeeshan, khan, Naimul

论文摘要

卷积神经网络（CNN）提供了从其架构的所有层中提取和融合特征的杠杆作用。然而，使用深度和惯性传感器，从不同层的CNN结构中提取和融合中间特征仍未对人类动作识别（HAR）进行评估。为了获得最大的好处，在本文中，我们提出了新型的多阶段通用平均融合（MGAF）网络，该网络使用我们的小说和计算有效的GAID Goted Goted融合（GAF）网络从CNN的所有层中提取和融合了特征，MGAF的决定性积分元素。在提出的MGAF的输入下，我们分别将深度和惯性传感器数据转换为称为顺序前视图图像（SFI）和信号图像（SI）的深度图像。这些SFI是由深度数据生成的前视图形成的。 CNN用于从两种输入方式中提取特征图。 GAF网络可以有效地融合提取的功能，同时保留融合功能的维度。拟议的MGAF网络具有结构性的可扩展性，可以展开至两个以上的方式。在三个公开可用的多模式HAR数据集上进行的实验表明，所提出的MGAF在识别精度方面优于先前的深度惯性HAR的最先前的融合方法，而计算在计算上的效率更高。我们将准确性平均提高1.5％，同时将计算成本降低约50％，而不是先前的最新状态。

Convolutional Neural Network (CNN) provides leverage to extract and fuse features from all layers of its architecture. However, extracting and fusing intermediate features from different layers of CNN structure is still uninvestigated for Human Action Recognition (HAR) using depth and inertial sensors. To get maximum benefit of accessing all the CNN's layers, in this paper, we propose novel Multistage Gated Average Fusion (MGAF) network which extracts and fuses features from all layers of CNN using our novel and computationally efficient Gated Average Fusion (GAF) network, a decisive integral element of MGAF. At the input of the proposed MGAF, we transform the depth and inertial sensor data into depth images called sequential front view images (SFI) and signal images (SI) respectively. These SFI are formed from the front view information generated by depth data. CNN is employed to extract feature maps from both input modalities. GAF network fuses the extracted features effectively while preserving the dimensionality of fused feature as well. The proposed MGAF network has structural extensibility and can be unfolded to more than two modalities. Experiments on three publicly available multimodal HAR datasets demonstrate that the proposed MGAF outperforms the previous state of the art fusion methods for depth-inertial HAR in terms of recognition accuracy while being computationally much more efficient. We increase the accuracy by an average of 1.5 percent while reducing the computational cost by approximately 50 percent over the previous state of the art.

下载PDF全文

下载文献需遵守相关版权规定

论文标题