2D与3D卷积尖峰神经网络接受了无监督的STD训练的人类行动识别

论文标题

2D与3D卷积尖峰神经网络接受了无监督的STD训练的人类行动识别

2D versus 3D Convolutional Spiking Neural Networks Trained with Unsupervised STDP for Human Action Recognition

论文作者

El-Assal, Mireille, Tirilly, Pierre, Bilasco, Ioan Marius

论文摘要

当前的技术进步强调了视频分析在计算机视觉领域的重要性。但是，传统人工神经网络（ANN）的视频分析具有相当高的计算成本。尖峰神经网络（SNN）是第三代生物学上合理的模型，以尖峰形式处理信息。使用SPIKE定时依赖性可塑性（STDP）规则使用SNN进行无监督的学习有可能克服一些常规人工神经网络的瓶颈，但基于STDP的SNN仍然不成熟，其性能远远落后于ANN的瓶颈。在这项工作中，我们研究SNN的性能在受到人类行动识别任务的挑战时，因为该任务在计算机视觉中具有许多实时应用，例如视频监视。在本文中，我们介绍了一个多层3D卷积SNN模型，该模型训练了无监督的STDP。当对KTH和Weizmann数据集挑战时，我们将该模型与基于2D STD的SNN的性能进行比较。我们还比较了这些模型的单层和多层版本，以便对其性能进行准确的评估。我们表明，基于STDP的卷积SNN可以使用3D内核学习运动模式，从而从视频中获得基于运动的识别。最后，我们提供证据表明，3D卷积优于基于STDP的SNN的2D卷积，尤其是在处理长时间视频序列时。

Current advances in technology have highlighted the importance of video analysis in the domain of computer vision. However, video analysis has considerably high computational costs with traditional artificial neural networks (ANNs). Spiking neural networks (SNNs) are third generation biologically plausible models that process the information in the form of spikes. Unsupervised learning with SNNs using the spike timing dependent plasticity (STDP) rule has the potential to overcome some bottlenecks of regular artificial neural networks, but STDP-based SNNs are still immature and their performance is far behind that of ANNs. In this work, we study the performance of SNNs when challenged with the task of human action recognition, because this task has many real-time applications in computer vision, such as video surveillance. In this paper we introduce a multi-layered 3D convolutional SNN model trained with unsupervised STDP. We compare the performance of this model to those of a 2D STDP-based SNN when challenged with the KTH and Weizmann datasets. We also compare single-layer and multi-layer versions of these models in order to get an accurate assessment of their performance. We show that STDP-based convolutional SNNs can learn motion patterns using 3D kernels, thus enabling motion-based recognition from videos. Finally, we give evidence that 3D convolution is superior to 2D convolution with STDP-based SNNs, especially when dealing with long video sequences.

下载PDF全文

下载文献需遵守相关版权规定

论文标题