论文标题

基于视频的对象6D姿势使用变压器估算

Video based Object 6D Pose Estimation using Transformers

论文作者

Beedu, Apoorva, Alamri, Huda, Essa, Irfan

论文摘要

我们介绍了一个基于变压器的6D对象姿势估计框架的视频,其中包括基于端到端注意的建模体系结构,该架构会在以前的帧中进行,以便在视频中估算准确的6D对象姿势。我们的方法利用视频序列的时间信息进行姿势细化,并具有计算高效且健壮的效果。与现有方法相比,我们的体系结构能够有效地捕获和理由,从而迭代地精炼视频序列。对YCB-VIDEO数据集的实验评估表明,我们的方法与最新的变压器方法相当,并且相对于基于CNN的方法的性能明显更好。此外,速度为33 fps,它也更有效,因此适用于需要实时对象姿势估计的各种应用程序。培训代码和预估计的模型可从https://github.com/apoorvabeedu/videopose获得

We introduce a Transformer based 6D Object Pose Estimation framework VideoPose, comprising an end-to-end attention based modelling architecture, that attends to previous frames in order to estimate accurate 6D Object Poses in videos. Our approach leverages the temporal information from a video sequence for pose refinement, along with being computationally efficient and robust. Compared to existing methods, our architecture is able to capture and reason from long-range dependencies efficiently, thus iteratively refining over video sequences. Experimental evaluation on the YCB-Video dataset shows that our approach is on par with the state-of-the-art Transformer methods, and performs significantly better relative to CNN based approaches. Further, with a speed of 33 fps, it is also more efficient and therefore applicable to a variety of applications that require real-time object pose estimation. Training code and pretrained models are available at https://github.com/ApoorvaBeedu/VideoPose

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源