论文标题
在压缩视频中,共同信息正式化身份感知的面部表情表达式识别
Mutual Information Regularized Identity-aware Facial ExpressionRecognition in Compressed Video
论文作者
论文摘要
如何提取不变特异性属性的有效表达表示是面部表达识别(FER)的持久问题。以前的大多数方法都处理了序列的RGB图像,而我们认为现成和有价值的表达相关的肌肉运动已经以压缩形式嵌入。在本文中,我们针对的是探索受压缩视频域中消除面部表达表示的受试者间变化。在最多两个数量级的压缩域中,我们可以从残留框架中明确地推断出具有预训练的面部识别网络的I框架的表达,并可能从I帧中提取身份因子。通过执行其边际独立性,表达特征有望更加纯净,并且对身份转移具有鲁棒性。具体而言,我们提出了一种新颖的合作式最小游戏,以在潜在空间中最小化相互信息(MI)。我们不需要同一人的身份标签或多个表达样本来消除身份。此外,当数据集中注释Apex框架时,可以进一步添加互补约束以正规化功能级游戏。在测试中,仅需要压缩残留帧才能实现表达预测。我们的解决方案可以比最新的基于图像的基于图像的方法在典型的FER基准测试速度上实现可比性或更好的性能,其推断速度约为3倍。
How to extract effective expression representations that invariant to the identity-specific attributes is a long-lasting problem for facial expression recognition (FER). Most of the previous methods process the RGB images of a sequence, while we argue that the off-the-shelf and valuable expression-related muscle movement is already embedded in the compression format. In this paper, we target to explore the inter-subject variations eliminated facial expression representation in the compressed video domain. In the up to two orders of magnitude compressed domain, we can explicitly infer the expression from the residual frames and possibly extract identity factors from the I frame with a pre-trained face recognition network. By enforcing the marginal independence of them, the expression feature is expected to be purer for the expression and be robust to identity shifts. Specifically, we propose a novel collaborative min-min game for mutual information (MI) minimization in latent space. We do not need the identity label or multiple expression samples from the same person for identity elimination. Moreover, when the apex frame is annotated in the dataset, the complementary constraint can be further added to regularize the feature-level game. In testing, only the compressed residual frames are required to achieve expression prediction. Our solution can achieve comparable or better performance than the recent decoded image-based methods on the typical FER benchmarks with about 3 times faster inference.