论文标题
感知优化和自校准的音调映射操作员
A Perceptually Optimized and Self-Calibrated Tone Mapping Operator
论文作者
论文摘要
随着高动态范围(HDR)摄影的日益普及和可访问性,用于动态范围压缩的音调映射操作员(TMO)实际上是要求的。在本文中,我们开发了一个两阶段的基于神经网络的TMO,该TMO是自校准并在感知上优化的。在第一阶段,是由人类视觉系统早期阶段的生理学动机的,我们首先将HDR图像分解为标准化的拉普拉斯金字塔。然后,我们使用两个轻巧的深神经网络(DNN),以归一化表示形式为输入并估计相应LDR图像的拉普拉斯金字塔。我们通过最小化标准化的拉普拉斯金字塔距离(NLPD)来优化音调映射网络,这是一种感知度量与人体对音调图像质量的判断的一致性。在第二阶段,对输入HDR图像进行自校准以计算最终的LDR图像。我们为相同的HDR图像提供了相同的HDR图像,但用不同的最大亮度重新缩放到学习的音调映射网络,并生成具有不同细节可见性和颜色饱和度的伪型曝光图像堆栈。然后,我们通过最大化多曝光图像融合(MEF-SSIM)的结构相似性指数(MEF-SSSIM)的变体来训练另一个轻质DNN将LDR图像堆叠融合到所需的LDR图像中,该变体已被证明与融合图像质量相关。通过MEF提出的自我校准机制使我们的TMO可以接受未校准的HDR图像,同时是生理驱动的。广泛的实验表明,我们的方法产生的图像始终如一,视觉质量始终如一。此外,由于我们的方法建立在三个轻型DNN上,因此它是本地最快的TMO之一。
With the increasing popularity and accessibility of high dynamic range (HDR) photography, tone mapping operators (TMOs) for dynamic range compression are practically demanding. In this paper, we develop a two-stage neural network-based TMO that is self-calibrated and perceptually optimized. In Stage one, motivated by the physiology of the early stages of the human visual system, we first decompose an HDR image into a normalized Laplacian pyramid. We then use two lightweight deep neural networks (DNNs), taking the normalized representation as input and estimating the Laplacian pyramid of the corresponding LDR image. We optimize the tone mapping network by minimizing the normalized Laplacian pyramid distance (NLPD), a perceptual metric aligning with human judgments of tone-mapped image quality. In Stage two, the input HDR image is self-calibrated to compute the final LDR image. We feed the same HDR image but rescaled with different maximum luminances to the learned tone mapping network, and generate a pseudo-multi-exposure image stack with different detail visibility and color saturation. We then train another lightweight DNN to fuse the LDR image stack into a desired LDR image by maximizing a variant of the structural similarity index for multi-exposure image fusion (MEF-SSIM), which has been proven perceptually relevant to fused image quality. The proposed self-calibration mechanism through MEF enables our TMO to accept uncalibrated HDR images, while being physiology-driven. Extensive experiments show that our method produces images with consistently better visual quality. Additionally, since our method builds upon three lightweight DNNs, it is among the fastest local TMOs.