论文标题
低计算的单眼深度分布比对
Monocular Depth Distribution Alignment with Low Computation
论文作者
论文摘要
单眼深度估计的性能通常取决于参数和计算成本的数量。它导致轻度重量网络和重量重量网络之间的准确对比度很大,这限制了它们在现实世界中的应用。在本文中,我们将它们之间的大多数准确对比度建模为深度分布的差异,我们称之为“分布漂移”。为此,提出了一个分配对准网络(DANET)。我们首先设计了一个金字塔场景变压器(PST)模块,以捕获多个尺度的区域间相互作用。通过感知每个两个区域之间的深度特征的差异,Danet倾向于预测合理的场景结构,该结构适合分布形状与地面真理。然后,我们提出了一个本地全球优化(LGO)方案,以实现对全球场景深度范围的监督。由于深度分布形状和场景深度范围的对齐,Danet彻底吸引了分布漂移,并通过先前的重量级方法实现了可比的性能,但仅使用1%的浮点操作(每秒)。两个数据集上的实验,即广泛使用的NYUDV2数据集和更具挑战性的IBIMS-1数据集,证明了我们方法的有效性。源代码可在https://github.com/yilim1/danet上找到。
The performance of monocular depth estimation generally depends on the amount of parameters and computational cost. It leads to a large accuracy contrast between light-weight networks and heavy-weight networks, which limits their application in the real world. In this paper, we model the majority of accuracy contrast between them as the difference of depth distribution, which we call "Distribution drift". To this end, a distribution alignment network (DANet) is proposed. We firstly design a pyramid scene transformer (PST) module to capture inter-region interaction in multiple scales. By perceiving the difference of depth features between every two regions, DANet tends to predict a reasonable scene structure, which fits the shape of distribution to ground truth. Then, we propose a local-global optimization (LGO) scheme to realize the supervision of global range of scene depth. Thanks to the alignment of depth distribution shape and scene depth range, DANet sharply alleviates the distribution drift, and achieves a comparable performance with prior heavy-weight methods, but uses only 1% floating-point operations per second (FLOPs) of them. The experiments on two datasets, namely the widely used NYUDv2 dataset and the more challenging iBims-1 dataset, demonstrate the effectiveness of our method. The source code is available at https://github.com/YiLiM1/DANet.