Mestereo-du2cnn：一种新型的双通道CNN，用于从HDR 3D应用中从多曝光立体声图像中学习稳健的深度估算

论文标题

Mestereo-du2cnn：一种新型的双通道CNN，用于从HDR 3D应用中从多曝光立体声图像中学习稳健的深度估算

MEStereo-Du2CNN: A Novel Dual Channel CNN for Learning Robust Depth Estimates from Multi-exposure Stereo Images for HDR 3D Applications

论文作者

Choudhary, Rohit, Sharma, Mansi, T V, Uma, Anil, Rithvik

论文摘要

这些年来，展示技术已经发展。开发实用的HDR捕获，处理和显示解决方案以将3D技术提升到一个新的水平至关重要。多曝光立体声图像序列的深度估计是开发成本效益3D HDR视频内容的重要任务。在本文中，我们开发了一种新型的深层体系结构，以进行多曝光立体深度估计。拟议的建筑有两个新颖的组成部分。首先，对传统立体声深度估计中使用的立体声匹配技术进行了修改。对于我们体系结构的立体深度估计部分，部署了单一到stereo转移学习方法。拟议的配方规定了成本量构造的要求，该要求由基于重新编码的单码编码器CNN取代，具有不同的重量以进行功能融合。基于有效网络的块用于学习差异。其次，我们结合了从立体声图像在不同曝光水平上获得的差异图，使用强大的差异特征融合方法。使用针对不同质量度量计算的重量图合并了在不同暴露下获得的差异图。获得的最终预测差异图更强大，并且保留了保持深度不连续性的最佳功能。提出的CNN具有使用标准动态范围立体声数据或具有多曝光低动态范围立体序列训练的灵活性。在性能方面，所提出的模型超过了最新的单眼和立体声深度估计方法，无论是定量还是质量地，在具有挑战性的场景流程上以及遭受不同暴露的米德尔伯里立体声数据集上。该体系结构在复杂的自然场景中表现出色，证明了其对不同3D HDR应用的有用性。

Display technologies have evolved over the years. It is critical to develop practical HDR capturing, processing, and display solutions to bring 3D technologies to the next level. Depth estimation of multi-exposure stereo image sequences is an essential task in the development of cost-effective 3D HDR video content. In this paper, we develop a novel deep architecture for multi-exposure stereo depth estimation. The proposed architecture has two novel components. First, the stereo matching technique used in traditional stereo depth estimation is revamped. For the stereo depth estimation component of our architecture, a mono-to-stereo transfer learning approach is deployed. The proposed formulation circumvents the cost volume construction requirement, which is replaced by a ResNet based dual-encoder single-decoder CNN with different weights for feature fusion. EfficientNet based blocks are used to learn the disparity. Secondly, we combine disparity maps obtained from the stereo images at different exposure levels using a robust disparity feature fusion approach. The disparity maps obtained at different exposures are merged using weight maps calculated for different quality measures. The final predicted disparity map obtained is more robust and retains best features that preserve the depth discontinuities. The proposed CNN offers flexibility to train using standard dynamic range stereo data or with multi-exposure low dynamic range stereo sequences. In terms of performance, the proposed model surpasses state-of-the-art monocular and stereo depth estimation methods, both quantitatively and qualitatively, on challenging Scene flow and differently exposed Middlebury stereo datasets. The architecture performs exceedingly well on complex natural scenes, demonstrating its usefulness for diverse 3D HDR applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题