论文标题
带有3D姿势和形状估计的Grawiteer的K级面向图形的变压器
K-Order Graph-oriented Transformer with GraAttention for 3D Pose and Shape Estimation
论文作者
论文摘要
我们为图形结构数据(名为Kog-Transformer)提出了一个新型的基于注意力的2到3D姿势估计网络,以及一个名为GASE-NET的3D姿势对形状估计网络。先前的3D姿势估计方法集中在图形卷积内核的各种修改上,例如放弃重量共享或增加接受场。其中一些方法采用基于注意力的非本地模块作为辅助模块。为了更好地模拟图形结构数据中的节点之间的关系,并以差异化的方式融合了不同邻居节点的信息,我们对注意力模块进行了针对性的修改,并提出了两个用于图形结构数据的模块,图形相对位置编码多头部自我发言(GR-MSA)和K-级图形图形自我态度自称(K-MSA)。通过堆叠GR-MSA和KOG-MSA,我们提出了一种新型的网络KOG转换器,以进行2到3D姿势估计。此外,我们提出了一个用于手数据的形状估计的网络,称为Graistention形状估计网络(GASE-NET),该网络以3D姿势为输入,并逐渐将手的形状从稀疏到密集建模。我们通过广泛的实验从经验上证明了KOG转化器的优势。实验结果表明,KOG转换器在基准数据集Human36M上的先前最新方法显着超过了先前的最新方法。我们评估了Gase-NET对两个公共可用手数据集的影响Obman和Interhand2.6M。 GASE-NET可以预测具有强泛化能力的输入姿势的相应形状。
We propose a novel attention-based 2D-to-3D pose estimation network for graph-structured data, named KOG-Transformer, and a 3D pose-to-shape estimation network for hand data, named GASE-Net. Previous 3D pose estimation methods have focused on various modifications to the graph convolution kernel, such as abandoning weight sharing or increasing the receptive field. Some of these methods employ attention-based non-local modules as auxiliary modules. In order to better model the relationship between nodes in graph-structured data and fuse the information of different neighbor nodes in a differentiated way, we make targeted modifications to the attention module and propose two modules designed for graph-structured data, graph relative positional encoding multi-head self-attention (GR-MSA) and K-order graph-oriented multi-head self-attention (KOG-MSA). By stacking GR-MSA and KOG-MSA, we propose a novel network KOG-Transformer for 2D-to-3D pose estimation. Furthermore, we propose a network for shape estimation on hand data, called GraAttention shape estimation network (GASE-Net), which takes a 3D pose as input and gradually models the shape of the hand from sparse to dense. We have empirically shown the superiority of KOG-Transformer through extensive experiments. Experimental results show that KOG-Transformer significantly outperforms the previous state-of-the-art methods on the benchmark dataset Human3.6M. We evaluate the effect of GASE-Net on two public available hand datasets, ObMan and InterHand2.6M. GASE-Net can predict the corresponding shape for input pose with strong generalization ability.