工艺：跨注意流变压器可靠的光流

论文标题

工艺：跨注意流变压器可靠的光流

CRAFT: Cross-Attentional Flow Transformer for Robust Optical Flow

论文作者

Sui, Xiuchao, Li, Shaohua, Geng, Xue, Wu, Yan, Xu, Xinxing, Liu, Yong, Goh, Rick, Zhu, Hongyuan

论文摘要

光流估计旨在通过识别两个图像之间的相应像素来找到2D运动场。尽管基于深度学习的光流方法取得了巨大进展，但准确地估计运动模糊的大型位移仍然是一个挑战。这主要是因为将相关量（像素匹配的基础）计算为两个图像的卷积特征的点产物。卷积特征的局部性使计算的相关性容易受到各种声音的影响。在运动模糊的大型位移上，嘈杂的相关性可能会导致估计流动的严重错误。为了克服这一挑战，我们提出了一种新的架构“跨注意流变压器”（工艺），旨在振兴相关量计算。在工艺中，语义平滑的变压器层变化了一个框架的特征，使它们更具全球性和语义稳定。此外，点 - 产物相关性被变压器跨框架的关注所取代。该图层通过查询和关键预测过滤噪声，并计算更准确的相关性。在Sintel（Final）和Kitti（前景）基准上，Craft取得了新的最新性能。此外，为了测试不同模型的鲁棒性，我们设计了一个图像变化攻击，该攻击可以移动输入图像以产生大型人工动作。在这次攻击下，Craft的性能要比两种代表性方法木筏和GMA更强大。手工艺守则可在https://github.com/askerlee/craft上获得。

Optical flow estimation aims to find the 2D motion field by identifying corresponding pixels between two images. Despite the tremendous progress of deep learning-based optical flow methods, it remains a challenge to accurately estimate large displacements with motion blur. This is mainly because the correlation volume, the basis of pixel matching, is computed as the dot product of the convolutional features of the two images. The locality of convolutional features makes the computed correlations susceptible to various noises. On large displacements with motion blur, noisy correlations could cause severe errors in the estimated flow. To overcome this challenge, we propose a new architecture "CRoss-Attentional Flow Transformer" (CRAFT), aiming to revitalize the correlation volume computation. In CRAFT, a Semantic Smoothing Transformer layer transforms the features of one frame, making them more global and semantically stable. In addition, the dot-product correlations are replaced with transformer Cross-Frame Attention. This layer filters out feature noises through the Query and Key projections, and computes more accurate correlations. On Sintel (Final) and KITTI (foreground) benchmarks, CRAFT has achieved new state-of-the-art performance. Moreover, to test the robustness of different models on large motions, we designed an image shifting attack that shifts input images to generate large artificial motions. Under this attack, CRAFT performs much more robustly than two representative methods, RAFT and GMA. The code of CRAFT is is available at https://github.com/askerlee/craft.

下载PDF全文

下载文献需遵守相关版权规定

论文标题