论文标题
Blotdformer:在所有模块中驯服视频虚拟试验
ClothFormer:Taming Video Virtual Try-on in All Module
论文作者
论文摘要
视频虚拟试验的任务旨在以时空的一致性将目标衣服适合视频中的人。尽管图像的虚拟尝试取得了巨大进展,但它们在应用于视频时会导致框架之间的不一致性。有限的工作还探索了基于视频的虚拟尝试的任务,但未能产生视觉令人愉悦且具有暂时性的结果。此外,还有其他两个关键挑战:1)在服装区域出现闭塞时,如何产生准确的翘曲; 2)如何与复杂的背景和谐相处,生成衣服和非目标的身体部位(例如手臂,颈部);为了解决这些问题,我们提出了一个新颖的视频虚拟试验框架,即ClothFormer,该框架成功地综合了现实,和谐和时空的一致性,从而在复杂的环境中产生了一致性。尤其是,布料剂涉及三个主要模块。首先,一个两阶段的抗封闭翘曲模块,可预测身体区域和衣服区域之间准确的致密流量映射。其次,一个外观流动跟踪模块利用脊回归和光流校正来平滑致密的流动序列并产生暂时光滑的衣服序列。第三,双流变压器提取并融合了服装纹理,人员功能和环境信息,以生成逼真的尝试视频。通过严格的实验,我们证明了我们的方法在质量和定量上都高度超过了基准。
The task of video virtual try-on aims to fit the target clothes to a person in the video with spatio-temporal consistency. Despite tremendous progress of image virtual try-on, they lead to inconsistency between frames when applied to videos. Limited work also explored the task of video-based virtual try-on but failed to produce visually pleasing and temporally coherent results. Moreover, there are two other key challenges: 1) how to generate accurate warping when occlusions appear in the clothing region; 2) how to generate clothes and non-target body parts (e.g. arms, neck) in harmony with the complicated background; To address them, we propose a novel video virtual try-on framework, ClothFormer, which successfully synthesizes realistic, harmonious, and spatio-temporal consistent results in complicated environment. In particular, ClothFormer involves three major modules. First, a two-stage anti-occlusion warping module that predicts an accurate dense flow mapping between the body regions and the clothing regions. Second, an appearance-flow tracking module utilizes ridge regression and optical flow correction to smooth the dense flow sequence and generate a temporally smooth warped clothing sequence. Third, a dual-stream transformer extracts and fuses clothing textures, person features, and environment information to generate realistic try-on videos. Through rigorous experiments, we demonstrate that our method highly surpasses the baselines in terms of synthesized video quality both qualitatively and quantitatively.