DG-STGCN：基于骨架的动作识别的动态时空建模

论文标题

DG-STGCN：基于骨架的动作识别的动态时空建模

DG-STGCN: Dynamic Spatial-Temporal Modeling for Skeleton-based Action Recognition

论文作者

Duan, Haodong, Wang, Jiaqi, Chen, Kai, Lin, Dahua

论文摘要

图形卷积网络（GCN）已被广泛用于基于骨架的动作识别。我们注意到，现有的基于GCN的方法主要依赖于规定的图形结构（即手动定义的骨架关节拓扑结构），这限制了它们的灵活性以捕获关节之间的复杂相关性。为了超越此限制，我们为基于骨架的动作识别提出了一个新的框架，即动态组时空GCN（DG-STGCN）。它分别由两个模块组成，分别是DG-GCN和DG-TCN，用于空间和时间建模。特别是，DG-GCN使用学习的亲和力矩阵来捕获动态图形结构，而不是依靠处方的矩阵，而DG-TCN则具有不同的接收场，并结合了动态的关节骨骼融合模块，以进行群体的暂时卷积。在包括Nturgb+D，Kinetics-Skeleton，Babel和Toyota Smarthome在内的广泛基准上，DG-STGCN始终超过最先进的方法，通常以显着的边距。

Graph convolution networks (GCN) have been widely used in skeleton-based action recognition. We note that existing GCN-based approaches primarily rely on prescribed graphical structures (ie., a manually defined topology of skeleton joints), which limits their flexibility to capture complicated correlations between joints. To move beyond this limitation, we propose a new framework for skeleton-based action recognition, namely Dynamic Group Spatio-Temporal GCN (DG-STGCN). It consists of two modules, DG-GCN and DG-TCN, respectively, for spatial and temporal modeling. In particular, DG-GCN uses learned affinity matrices to capture dynamic graphical structures instead of relying on a prescribed one, while DG-TCN performs group-wise temporal convolutions with varying receptive fields and incorporates a dynamic joint-skeleton fusion module for adaptive multi-level temporal modeling. On a wide range of benchmarks, including NTURGB+D, Kinetics-Skeleton, BABEL, and Toyota SmartHome, DG-STGCN consistently outperforms state-of-the-art methods, often by a notable margin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题