通过面部对齐，培训优化和调度来改善面部视频的抑郁估计

论文标题

通过面部对齐，培训优化和调度来改善面部视频的抑郁估计

Improving Depression estimation from facial videos with face alignment, training optimization and scheduling

论文作者

Cañellas, Manuel Lage, Casado, Constantino Álvarez, Nguyen, Le, López, Miguel Bordallo

论文摘要

深度学习模型在使用基于视频的面部表情识别抑郁状态方面显示出令人鼓舞的结果。尽管成功的模型通常使用3D-CNN或视频蒸馏技术利用，但跨实验的训练，数据增强，预处理和优化技术的不同使用使得很难进行公平的建筑比较。我们建议改善基于Resnet-50的两个简单模型，这些模型仅使用静态空间信息，通过使用两种特定的面部对准方法并改进了数据增强，优化和调度技术。我们在基准数据集上进行的广泛实验获得的结果与单流的复杂时空模型相似，而两个不同流的得分级融合胜过最先进的方法。我们的发现表明，预处理和训练过程中的特定修改会导致模型性能的明显差异，并且可能隐藏最初归因于使用不同神经网络体系结构的实际实际差异。

Deep learning models have shown promising results in recognizing depressive states using video-based facial expressions. While successful models typically leverage using 3D-CNNs or video distillation techniques, the different use of pretraining, data augmentation, preprocessing, and optimization techniques across experiments makes it difficult to make fair architectural comparisons. We propose instead to enhance two simple models based on ResNet-50 that use only static spatial information by using two specific face alignment methods and improved data augmentation, optimization, and scheduling techniques. Our extensive experiments on benchmark datasets obtain similar results to sophisticated spatio-temporal models for single streams, while the score-level fusion of two different streams outperforms state-of-the-art methods. Our findings suggest that specific modifications in the preprocessing and training process result in noticeable differences in the performance of the models and could hide the actual originally attributed to the use of different neural network architectures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题