论文标题
DG-STGCN:基于骨架的动作识别的动态时空建模
DG-STGCN: Dynamic Spatial-Temporal Modeling for Skeleton-based Action Recognition
论文作者
论文摘要
图形卷积网络(GCN)已被广泛用于基于骨架的动作识别。我们注意到,现有的基于GCN的方法主要依赖于规定的图形结构(即手动定义的骨架关节拓扑结构),这限制了它们的灵活性以捕获关节之间的复杂相关性。为了超越此限制,我们为基于骨架的动作识别提出了一个新的框架,即动态组时空GCN(DG-STGCN)。它分别由两个模块组成,分别是DG-GCN和DG-TCN,用于空间和时间建模。特别是,DG-GCN使用学习的亲和力矩阵来捕获动态图形结构,而不是依靠处方的矩阵,而DG-TCN则具有不同的接收场,并结合了动态的关节骨骼融合模块,以进行群体的暂时卷积。在包括Nturgb+D,Kinetics-Skeleton,Babel和Toyota Smarthome在内的广泛基准上,DG-STGCN始终超过最先进的方法,通常以显着的边距。
Graph convolution networks (GCN) have been widely used in skeleton-based action recognition. We note that existing GCN-based approaches primarily rely on prescribed graphical structures (ie., a manually defined topology of skeleton joints), which limits their flexibility to capture complicated correlations between joints. To move beyond this limitation, we propose a new framework for skeleton-based action recognition, namely Dynamic Group Spatio-Temporal GCN (DG-STGCN). It consists of two modules, DG-GCN and DG-TCN, respectively, for spatial and temporal modeling. In particular, DG-GCN uses learned affinity matrices to capture dynamic graphical structures instead of relying on a prescribed one, while DG-TCN performs group-wise temporal convolutions with varying receptive fields and incorporates a dynamic joint-skeleton fusion module for adaptive multi-level temporal modeling. On a wide range of benchmarks, including NTURGB+D, Kinetics-Skeleton, BABEL, and Toyota SmartHome, DG-STGCN consistently outperforms state-of-the-art methods, often by a notable margin.