论文标题
HARDVS:通过动态视觉传感器重新审视人类活动识别
HARDVS: Revisiting Human Activity Recognition with Dynamic Vision Sensors
论文作者
论文摘要
人类活动识别(HAR)算法的主要流是基于RGB摄像机开发的,RGB摄像机受到照明,快速运动,保护隐私和大量能源消耗的影响。同时,由于其独特功能,例如高动态范围,较高的时间范围但稀疏的空间分辨率,低潜伏期,低功率等,因此具有生物学启发的活动相机引起了极大的兴趣。由于它是一个新出现的传感器,即使没有现实的大型数据集。考虑到其巨大的实践价值,在本文中,我们提出了一个大规模的基准数据集来弥合此差距,称为HARDV,其中包含300个类别和超过100K事件序列。我们评估并报告了多种流行的HAR算法的性能,该算法为未来的作品提供了广泛的基线。更重要的是,我们提出了一种新型的时空特征学习和融合框架,称为ESTF,用于基于事件流的人类活动识别。它首先使用Stemnet将事件流将其投射到空间和时间嵌入中,然后使用变压器网络编码并融合双视表示。最后,双重特征是连接的,并将其馈入分类头进行活动预测。多个数据集上的广泛实验完全验证了我们的模型的有效性。数据集和源代码都将在\ url {https://github.com/event-ahu/hardvs}上发布。
The main streams of human activity recognition (HAR) algorithms are developed based on RGB cameras which are suffered from illumination, fast motion, privacy-preserving, and large energy consumption. Meanwhile, the biologically inspired event cameras attracted great interest due to their unique features, such as high dynamic range, dense temporal but sparse spatial resolution, low latency, low power, etc. As it is a newly arising sensor, even there is no realistic large-scale dataset for HAR. Considering its great practical value, in this paper, we propose a large-scale benchmark dataset to bridge this gap, termed HARDVS, which contains 300 categories and more than 100K event sequences. We evaluate and report the performance of multiple popular HAR algorithms, which provide extensive baselines for future works to compare. More importantly, we propose a novel spatial-temporal feature learning and fusion framework, termed ESTF, for event stream based human activity recognition. It first projects the event streams into spatial and temporal embeddings using StemNet, then, encodes and fuses the dual-view representations using Transformer networks. Finally, the dual features are concatenated and fed into a classification head for activity prediction. Extensive experiments on multiple datasets fully validated the effectiveness of our model. Both the dataset and source code will be released on \url{https://github.com/Event-AHU/HARDVS}.