学习面部超分辨率的空间关注

论文标题

学习面部超分辨率的空间关注

Learning Spatial Attention for Face Super-Resolution

论文作者

Chen, Chaofeng, Gong, Dihong, Wang, Hao, Li, Zhifeng, Wong, Kwan-Yee K.

论文摘要

一般图像超分辨率技术在应用于低分辨率面部图像时恢复详细的面部结构很难。最近针对面部图像量身定制的基于深度学习的方法通过共同训练其他任务（例如面部解析和具有里程碑意义的预测）来提高性能。但是，多任务学习需要额外的手动标记数据。此外，大多数现有作品只能产生相对较低的分辨率面部图像（例如$ 128 \ times128 $），因此它们的应用程序受到限制。在本文中，我们介绍了一种新型的空间注意力残留网络（SPARNET），建立在我们新提议的面部注意单元（FAU）上，以实现面部超分辨率。具体来说，我们将一种空间注意机制引入了香草残留块。这使卷积层能够适应与关键面结构相关的自举特征，并更少注意那些较少特征的区域。这使训练更有效，效率更高，因为关键面结构仅占面部图像的很小一部分。注意图的可视化表明，即使对于非常低的分辨率面孔，我们的空间注意力网络也可以很好地捕获关键面结构（例如，$ 16 \ times16 $）。对各种指标（包括PSNR，SSIM，身份相似性和地标检测）的定量比较证明了我们方法比当前最新技术的优越性。我们进一步扩展了SPARNET的多尺度歧视器（称为Sparnethd），以产生高分辨率结果（即$ 512 \ times512 $）。我们表明，经过合成数据训练的SparnethD不仅可以为合成降级的面部图像产生高质量和高分辨率的输出，而且还显示出良好的概括能力来实现现实世界的低质量面部图像。

General image super-resolution techniques have difficulties in recovering detailed face structures when applying to low resolution face images. Recent deep learning based methods tailored for face images have achieved improved performance by jointly trained with additional task such as face parsing and landmark prediction. However, multi-task learning requires extra manually labeled data. Besides, most of the existing works can only generate relatively low resolution face images (e.g., $128\times128$), and their applications are therefore limited. In this paper, we introduce a novel SPatial Attention Residual Network (SPARNet) built on our newly proposed Face Attention Units (FAUs) for face super-resolution. Specifically, we introduce a spatial attention mechanism to the vanilla residual blocks. This enables the convolutional layers to adaptively bootstrap features related to the key face structures and pay less attention to those less feature-rich regions. This makes the training more effective and efficient as the key face structures only account for a very small portion of the face image. Visualization of the attention maps shows that our spatial attention network can capture the key face structures well even for very low resolution faces (e.g., $16\times16$). Quantitative comparisons on various kinds of metrics (including PSNR, SSIM, identity similarity, and landmark detection) demonstrate the superiority of our method over current state-of-the-arts. We further extend SPARNet with multi-scale discriminators, named as SPARNetHD, to produce high resolution results (i.e., $512\times512$). We show that SPARNetHD trained with synthetic data cannot only produce high quality and high resolution outputs for synthetically degraded face images, but also show good generalization ability to real world low quality face images.

下载PDF全文

下载文献需遵守相关版权规定

论文标题