具有里程碑意义的指导独立时空通道的关注和互补上下文信息的面部表达识别

论文标题

具有里程碑意义的指导独立时空通道的关注和互补上下文信息的面部表达识别

Landmark Guidance Independent Spatio-channel Attention and Complementary Context Information based Facial Expression Recognition

论文作者

Gera, Darshan, Balasubramanian, S

论文摘要

在现实世界中认识面部表情的最新趋势是在本地部署基于注意力的卷积神经网络（CNN），以表示面部区域的重要性，并将其与全球面部特征和/或其他互补上下文信息相结合以获得绩效增长。但是，在存在遮挡和构成变化的情况下，不同的通道的反应不同，进一步的频道的响应强度在空间位置之间有所不同。同样，现代面部表达识别（FER）架构依赖于地标探测器等外部来源来定义注意力。地标探测器的失败将对FER产生级联作用。此外，不强调要计算互补上下文信息的功能的相关性。为了利用上述观测值，在这项工作中提出了一种用于FER的端到端架构，该终端结构通过新型的时空通道注意网（SCAN）在每个空间位置同时获得了本地和全球关注，而无需从地标探测器中寻求任何信息。扫描与补充上下文信息（CCI）分支相辅相成。此外，使用有效的渠道注意（ECA），还参与了特征输入与CCI的相关性。所提出的体系结构学到的表示形式对遮挡和构成变化是可靠的。在LAB内和野外数据集（AffectNet，Ferplus，Raf-DB，Fed-ro，Sfew，Sfew，Ck+，Oulu-Casia和Jaffe）以及几个构造的面膜数据集中，都在covid9场景中的掩盖面孔。代码可在https://github.com/1980x/scan-ccifer上公开获取

A recent trend to recognize facial expressions in the real-world scenario is to deploy attention based convolutional neural networks (CNNs) locally to signify the importance of facial regions and, combine it with global facial features and/or other complementary context information for performance gain. However, in the presence of occlusions and pose variations, different channels respond differently, and further that the response intensity of a channel differ across spatial locations. Also, modern facial expression recognition(FER) architectures rely on external sources like landmark detectors for defining attention. Failure of landmark detector will have a cascading effect on FER. Additionally, there is no emphasis laid on the relevance of features that are input to compute complementary context information. Leveraging on the aforementioned observations, an end-to-end architecture for FER is proposed in this work that obtains both local and global attention per channel per spatial location through a novel spatio-channel attention net (SCAN), without seeking any information from the landmark detectors. SCAN is complemented by a complementary context information (CCI) branch. Further, using efficient channel attention (ECA), the relevance of features input to CCI is also attended to. The representation learnt by the proposed architecture is robust to occlusions and pose variations. Robustness and superior performance of the proposed model is demonstrated on both in-lab and in-the-wild datasets (AffectNet, FERPlus, RAF-DB, FED-RO, SFEW, CK+, Oulu-CASIA and JAFFE) along with a couple of constructed face mask datasets resembling masked faces in COVID-19 scenario. Codes are publicly available at https://github.com/1980x/SCAN-CCI-FER

下载PDF全文

下载文献需遵守相关版权规定

论文标题