论文标题

带有循环移动窗口注意的变压器跟踪

Transformer Tracking with Cyclic Shifting Window Attention

论文作者

Song, Zikai, Yu, Junqing, Chen, Yi-Ping Phoebe, Yang, Wei

论文摘要

变形金刚在视觉对象跟踪和有效的注意机制方面一直在显示其在视觉对象跟踪中的巨大力量。现有的基于变压器的方法在扁平的图像特征上采用像素到像素的注意策略,而不可避免地忽略了对象的完整性。在本文中,我们提出了一种新的变压器体系结构,具有多尺度的循环转换窗口的关注,以进行视觉对象跟踪,从而将注意力从像素级别提升到窗口级别。跨窗口多尺度的关注的优点是在不同尺度上汇总注意力,并为目标对象生成最佳的细尺度匹配。此外,循环转移策略通过使用位置信息扩展窗口样本,从而带来更高的准确性,同时通过删除冗余计算来节省大量的计算能力。广泛的实验证明了我们方法的出色性能,这也为五个具有挑战性的数据集创造了新的最新记录,以及dot2020,UAV123,Lasot,TrackingNet和GoT-100k基准。

Transformer architecture has been showing its great strength in visual object tracking, for its effective attention mechanism. Existing transformer-based approaches adopt the pixel-to-pixel attention strategy on flattened image features and unavoidably ignore the integrity of objects. In this paper, we propose a new transformer architecture with multi-scale cyclic shifting window attention for visual object tracking, elevating the attention from pixel to window level. The cross-window multi-scale attention has the advantage of aggregating attention at different scales and generates the best fine-scale match for the target object. Furthermore, the cyclic shifting strategy brings greater accuracy by expanding the window samples with positional information, and at the same time saves huge amounts of computational power by removing redundant calculations. Extensive experiments demonstrate the superior performance of our method, which also sets the new state-of-the-art records on five challenging datasets, along with the VOT2020, UAV123, LaSOT, TrackingNet, and GOT-10k benchmarks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源