基于视频的对象6D姿势使用变压器估算

论文标题

基于视频的对象6D姿势使用变压器估算

Video based Object 6D Pose Estimation using Transformers

论文作者

Beedu, Apoorva, Alamri, Huda, Essa, Irfan

论文摘要

我们介绍了一个基于变压器的6D对象姿势估计框架的视频，其中包括基于端到端注意的建模体系结构，该架构会在以前的帧中进行，以便在视频中估算准确的6D对象姿势。我们的方法利用视频序列的时间信息进行姿势细化，并具有计算高效且健壮的效果。与现有方法相比，我们的体系结构能够有效地捕获和理由，从而迭代地精炼视频序列。对YCB-VIDEO数据集的实验评估表明，我们的方法与最新的变压器方法相当，并且相对于基于CNN的方法的性能明显更好。此外，速度为33 fps，它也更有效，因此适用于需要实时对象姿势估计的各种应用程序。培训代码和预估计的模型可从https://github.com/apoorvabeedu/videopose获得

We introduce a Transformer based 6D Object Pose Estimation framework VideoPose, comprising an end-to-end attention based modelling architecture, that attends to previous frames in order to estimate accurate 6D Object Poses in videos. Our approach leverages the temporal information from a video sequence for pose refinement, along with being computationally efficient and robust. Compared to existing methods, our architecture is able to capture and reason from long-range dependencies efficiently, thus iteratively refining over video sequences. Experimental evaluation on the YCB-Video dataset shows that our approach is on par with the state-of-the-art Transformer methods, and performs significantly better relative to CNN based approaches. Further, with a speed of 33 fps, it is also more efficient and therefore applicable to a variety of applications that require real-time object pose estimation. Training code and pretrained models are available at https://github.com/ApoorvaBeedu/VideoPose

下载PDF全文

下载文献需遵守相关版权规定

论文标题