马林：面部视频表示的蒙面自动编码器学习

论文标题

马林：面部视频表示的蒙面自动编码器学习

MARLIN: Masked Autoencoder for facial video Representation LearnINg

论文作者

Cai, Zhixi, Ghosh, Shreya, Stefanov, Kalin, Dhall, Abhinav, Cai, Jianfei, Rezatofighi, Hamid, Haffari, Reza, Hayat, Munawar

论文摘要

本文提出了一种自我监督的方法，可以从视频中学习普遍的面部表征，该方法可以跨越各种面部分析任务，例如面部属性识别（FAR），面部表情识别（FER），深膜检测（DFD）和唇部同步（LS）。我们提出的名为Marlin的框架是一个面部视频遮罩的自动编码器，它可以从丰富可用的无通道的网络爬行的面部视频中学习高度健壮和通用的面部嵌入。作为一项具有挑战性的辅助任务，Marlin从密集的面部面部区域重建了面部的时空细节，这些面部面部主要包括眼睛，鼻子，嘴巴，嘴唇和皮肤，以捕获本地和全球方面，从而有助于编码通用和可转移的特征。 Through a variety of experiments on diverse downstream tasks, we demonstrate MARLIN to be an excellent facial video encoder as well as feature extractor, that performs consistently well across a variety of downstream tasks including FAR (1.13% gain over supervised benchmark), FER (2.64% gain over unsupervised benchmark), DFD (1.86% gain over unsupervised benchmark), LS (29.36% gain for Frechet Inception距离），甚至在低数据状态下。我们的代码和模型可在https://github.com/controlnet/marlin上找到。

This paper proposes a self-supervised approach to learn universal facial representations from videos, that can transfer across a variety of facial analysis tasks such as Facial Attribute Recognition (FAR), Facial Expression Recognition (FER), DeepFake Detection (DFD), and Lip Synchronization (LS). Our proposed framework, named MARLIN, is a facial video masked autoencoder, that learns highly robust and generic facial embeddings from abundantly available non-annotated web crawled facial videos. As a challenging auxiliary task, MARLIN reconstructs the spatio-temporal details of the face from the densely masked facial regions which mainly include eyes, nose, mouth, lips, and skin to capture local and global aspects that in turn help in encoding generic and transferable features. Through a variety of experiments on diverse downstream tasks, we demonstrate MARLIN to be an excellent facial video encoder as well as feature extractor, that performs consistently well across a variety of downstream tasks including FAR (1.13% gain over supervised benchmark), FER (2.64% gain over unsupervised benchmark), DFD (1.86% gain over unsupervised benchmark), LS (29.36% gain for Frechet Inception Distance), and even in low data regime. Our code and models are available at https://github.com/ControlNet/MARLIN .

下载PDF全文

下载文献需遵守相关版权规定

论文标题