金字塔频率网络，具有空间注意残留细化模块，用于单眼深度估计

论文标题

金字塔频率网络，具有空间注意残留细化模块，用于单眼深度估计

Pyramid Frequency Network with Spatial Attention Residual Refinement Module for Monocular Depth Estimation

论文作者

Lu, Zhengyang, Chen, Ying

论文摘要

基于深度学习的深度估计方法正在迅速发展，提供了优于现有方法的卓越性能。为了估计实际情况下的深度，深度估计模型需要各种噪声环境的鲁棒性。在这项工作中，提出了具有空间注意残留细化模块（SARRM）的金字塔频率网络（PFN）来处理现有深度学习方法的稳健性弱。为了用准确的细节重建深度图，SARRM构建了一种带有注意机制的残留融合方法，以完善模糊深度。设计了频划分策略，并开发了频率金字塔网络以从多个频段提取特征。通过频率策略，PFN可以在Make3D，Kitti Depth和Nyuv2数据集中的室内和室外场景中的最新方法中实现更好的视觉准确性。关于嘈杂的NYUV2数据集的其他实验表明，PFN比高噪声场景中现有的深度学习方法更可靠。

Deep-learning-based approaches to depth estimation are rapidly advancing, offering superior performance over existing methods. To estimate the depth in real-world scenarios, depth estimation models require the robustness of various noise environments. In this work, a Pyramid Frequency Network(PFN) with Spatial Attention Residual Refinement Module(SARRM) is proposed to deal with the weak robustness of existing deep-learning methods. To reconstruct depth maps with accurate details, the SARRM constructs a residual fusion method with an attention mechanism to refine the blur depth. The frequency division strategy is designed, and the frequency pyramid network is developed to extract features from multiple frequency bands. With the frequency strategy, PFN achieves better visual accuracy than state-of-the-art methods in both indoor and outdoor scenes on Make3D, KITTI depth, and NYUv2 datasets. Additional experiments on the noisy NYUv2 dataset demonstrate that PFN is more reliable than existing deep-learning methods in high-noise scenes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题