论文标题
视频会议的一击自由视图神经头综合综合
One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing
论文作者
论文摘要
我们提出了一个神经通话视频综合模型,并展示了其在视频会议上的应用。我们的模型学会了使用包含目标人员外观的源图像和决定输出运动中动作的驱动视频的源图像来综合说话头视频。我们的运动是基于新型关键点表示编码的,其中特定于身份的信息与特定于运动相关的信息被毫无根据。广泛的实验验证表明,我们的模型在基准数据集上的表现优于竞争方法。此外,我们的紧凑型关键点表示可以实现一个视频会议系统,该系统具有与商业H.264标准相同的视觉质量,而仅使用带宽的十分之一。此外,我们显示我们的关键点表示允许用户在合成过程中旋转头部,这对于模拟面对面的视频会议体验很有用。
We propose a neural talking-head video synthesis model and demonstrate its application to video conferencing. Our model learns to synthesize a talking-head video using a source image containing the target person's appearance and a driving video that dictates the motion in the output. Our motion is encoded based on a novel keypoint representation, where the identity-specific and motion-related information is decomposed unsupervisedly. Extensive experimental validation shows that our model outperforms competing methods on benchmark datasets. Moreover, our compact keypoint representation enables a video conferencing system that achieves the same visual quality as the commercial H.264 standard while only using one-tenth of the bandwidth. Besides, we show our keypoint representation allows the user to rotate the head during synthesis, which is useful for simulating face-to-face video conferencing experiences.