朝着基于骨架的动作识别的到A-T时空焦点

论文标题

朝着基于骨架的动作识别的到A-T时空焦点

Towards To-a-T Spatio-Temporal Focus for Skeleton-Based Action Recognition

论文作者

Ke, Lipeng, Peng, Kuan-Chuan, Lyu, Siwei

论文摘要

图形卷积网络（GCN）已被广泛用于建模基于骨架的动作识别的高阶动态依赖关系。大多数现有的方法并未明确地将高阶时空的重要性嵌入关节的空间连接拓扑和强度，并且他们在注意模块上没有直接的目标，可以共同了解在动作序列中何时以及何时侧重于何时何地。为了解决这些问题，我们提出了TO-A-T时空焦点（STF），这是一种基于骨架的动作识别框架，它利用时空梯度专注于相关的时空特征。我们首先提出具有可学习的梯度强化和实例依赖性邻接矩阵的STF模块，以建模高阶时空动力学。其次，我们提出了在基于梯度的时空焦点上定义的三个损失术语，以明确指导分类器在何时何地查看，区分混乱的类并优化堆叠的STF模块。 STF的表现优于NTU RGB+D 60，NTU RGB+D 120上的最新方法，以及所有15个设置中的动力学骨架400个数据集，而在不同的视图，主题，设置和输入模式和STF上也显示出对稀缺数据和数据集的设置的更准确性。

Graph Convolutional Networks (GCNs) have been widely used to model the high-order dynamic dependencies for skeleton-based action recognition. Most existing approaches do not explicitly embed the high-order spatio-temporal importance to joints' spatial connection topology and intensity, and they do not have direct objectives on their attention module to jointly learn when and where to focus on in the action sequence. To address these problems, we propose the To-a-T Spatio-Temporal Focus (STF), a skeleton-based action recognition framework that utilizes the spatio-temporal gradient to focus on relevant spatio-temporal features. We first propose the STF modules with learnable gradient-enforced and instance-dependent adjacency matrices to model the high-order spatio-temporal dynamics. Second, we propose three loss terms defined on the gradient-based spatio-temporal focus to explicitly guide the classifier when and where to look at, distinguish confusing classes, and optimize the stacked STF modules. STF outperforms the state-of-the-art methods on the NTU RGB+D 60, NTU RGB+D 120, and Kinetics Skeleton 400 datasets in all 15 settings over different views, subjects, setups, and input modalities, and STF also shows better accuracy on scarce data and dataset shifting settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题