PS-Transformer：使用自发机制学习稀疏光度立体声网络

论文标题

PS-Transformer：使用自发机制学习稀疏光度立体声网络

PS-Transformer: Learning Sparse Photometric Stereo Network using Self-Attention Mechanism

论文作者

Ikehata, Satoshi

论文摘要

现有的深度校准光度立体声网络基本上基于预定义的操作（例如线性投影和最大池）在不同的灯光下汇总了观测值。尽管它们有效地捕获，但简单的一阶操作通常无法捕获少量不同灯光下观察结果之间的高阶相互作用。为了解决此问题，本文提出了一个名为{\ it ps-transformer}的深稀疏校准的光度计立体声网络，该网络利用可学习的自我注意力专注机制正确捕获复杂的间相互作用。 PS-Transformer建立在双分支设计的基础上，以探索像素和图像方面的特征，并且通过中间表面正常监督对单个特征进行训练，以最大程度地提高几何可行性。还提供了一个名为Cyclesps+的新合成数据集，并进行了综合分析，以成功训练光度法立体网络。公开可用的基准数据集的广泛结果表明，所提出方法的表面正常预测准确性显着优于其他最先进的算法，这些算法数量相同，甚至与密集算法相当，这些算法是输入10 $ \ times times tim tim tim tim times $ $图像的密集算法。

Existing deep calibrated photometric stereo networks basically aggregate observations under different lights based on the pre-defined operations such as linear projection and max pooling. While they are effective with the dense capture, simple first-order operations often fail to capture the high-order interactions among observations under small number of different lights. To tackle this issue, this paper presents a deep sparse calibrated photometric stereo network named {\it PS-Transformer} which leverages the learnable self-attention mechanism to properly capture the complex inter-image interactions. PS-Transformer builds upon the dual-branch design to explore both pixel-wise and image-wise features and individual feature is trained with the intermediate surface normal supervision to maximize geometric feasibility. A new synthetic dataset named CyclesPS+ is also presented with the comprehensive analysis to successfully train the photometric stereo networks. Extensive results on the publicly available benchmark datasets demonstrate that the surface normal prediction accuracy of the proposed method significantly outperforms other state-of-the-art algorithms with the same number of input images and is even comparable to that of dense algorithms which input 10$\times$ larger number of images.

下载PDF全文

下载文献需遵守相关版权规定

论文标题