通过钥匙帧与变压器控制器的多种舞蹈合成

论文标题

通过钥匙帧与变压器控制器的多种舞蹈合成

Diverse Dance Synthesis via Keyframes with Transformer Controllers

论文作者

Pan, Junjun, Wang, Siyuan, Bai, Junxuan, Dai, Ju

论文摘要

现有的基于密钥帧的运动合成主要集中于循环作用或短期运动的产生，例如步行，跑步和近距离姿势之间的过渡。但是，这些方法将在处理复杂和即兴的运动（例如舞蹈表演和武术）时会大大降低合成运动的自然性和多样性。此外，当前的研究缺乏对产生的运动的细粒度控制，这对于智能人类计算机的互动和动画创作至关重要。在本文中，我们提出了一个基于多个约束的新型基于关键的运动生成网络，该网络可以通过学习的知识来实现多样化的舞蹈综合。具体而言，该算法主要基于复发性神经网络（RNN）和变压器体系结构制定。我们网络的骨干是由两个长的短期内存（LSTM）单元组成的层次RNN模块，其中使用第一个LSTM将历史框架的姿势嵌入到潜在空间中，并且第二个LSTM被用来预测下一帧的人类姿势。此外，我们的框架包含两个基于变压器的控制器，这些控制器分别用于建模根轨迹和速度因子的约束，以便更好地利用框架的时间上下文并实现细粒度的运动控制。我们在包含各种现代舞蹈的舞蹈数据集上验证了拟议的方法。三个定量分析的结果验证了我们算法的优势。视频和定性实验结果表明，我们算法产生的复杂运动序列即使是长期合成，也可以在关键帧之间实现多种和平滑的运动过渡。

Existing keyframe-based motion synthesis mainly focuses on the generation of cyclic actions or short-term motion, such as walking, running, and transitions between close postures. However, these methods will significantly degrade the naturalness and diversity of the synthesized motion when dealing with complex and impromptu movements, e.g., dance performance and martial arts. In addition, current research lacks fine-grained control over the generated motion, which is essential for intelligent human-computer interaction and animation creation. In this paper, we propose a novel keyframe-based motion generation network based on multiple constraints, which can achieve diverse dance synthesis via learned knowledge. Specifically, the algorithm is mainly formulated based on the recurrent neural network (RNN) and the Transformer architecture. The backbone of our network is a hierarchical RNN module composed of two long short-term memory (LSTM) units, in which the first LSTM is utilized to embed the posture information of the historical frames into a latent space, and the second one is employed to predict the human posture for the next frame. Moreover, our framework contains two Transformer-based controllers, which are used to model the constraints of the root trajectory and the velocity factor respectively, so as to better utilize the temporal context of the frames and achieve fine-grained motion control. We verify the proposed approach on a dance dataset containing a wide range of contemporary dance. The results of three quantitative analyses validate the superiority of our algorithm. The video and qualitative experimental results demonstrate that the complex motion sequences generated by our algorithm can achieve diverse and smooth motion transitions between keyframes, even for long-term synthesis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题