言语的自由形式的身体运动产生

论文标题

言语的自由形式的身体运动产生

Freeform Body Motion Generation from Speech

论文作者

Xu, Jing, Zhang, Wei, Bai, Yalong, Sun, Qibin, Mei, Tao

论文摘要

人们自然会进行自发的身体动作，以增强演讲，同时进行谈判。由于从语音到身体运动的非确定性映射，从语音产生的身体运动本质上是困难的。大多数现有作品通过根据某些样式进行调节，以确定性的方式将语音映射到运动，从而导致次优结果。通过语言学研究的激励，我们将共同语音运动分解为两个互补部分：姿势模式和节奏动力学。因此，我们通过配备两流体系结构（即用于初级姿势生成的姿势模式分支）和节奏动力学合成的节奏运动分支来引入一种新颖的自由形式运动生成模型（FREEMO）。一方面，由语音语义指导的潜在空间中的有条件采样产生了各种姿势模式。另一方面，节奏动力与言语韵律同步。在运动多样性，质量和语音同步方面，广泛的实验证明了针对多个基准的出色性能。代码和预培训模型将通过https://github.com/thetempaccount/co-spech-motion-generation公开获得。

People naturally conduct spontaneous body motions to enhance their speeches while giving talks. Body motion generation from speech is inherently difficult due to the non-deterministic mapping from speech to body motions. Most existing works map speech to motion in a deterministic way by conditioning on certain styles, leading to sub-optimal results. Motivated by studies in linguistics, we decompose the co-speech motion into two complementary parts: pose modes and rhythmic dynamics. Accordingly, we introduce a novel freeform motion generation model (FreeMo) by equipping a two-stream architecture, i.e., a pose mode branch for primary posture generation, and a rhythmic motion branch for rhythmic dynamics synthesis. On one hand, diverse pose modes are generated by conditional sampling in a latent space, guided by speech semantics. On the other hand, rhythmic dynamics are synced with the speech prosody. Extensive experiments demonstrate the superior performance against several baselines, in terms of motion diversity, quality and syncing with speech. Code and pre-trained models will be publicly available through https://github.com/TheTempAccount/Co-Speech-Motion-Generation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题