论文标题

部分可观测时空混沌系统的无模型预测

LWA-HAND: Lightweight Attention Hand for Interacting Hand Reconstruction

论文作者

Di, Xinhan, Yu, Pengqian

论文摘要

近年来,在实时应用程序(例如视觉现实和增强现实)中的手部重建方面取得了巨大的成功,同时通过有效的变压器与双手重建相互作用。在本文中,我们提出了一种称为轻巧注意手(LWA手)的方法,以从单个RGB图像中重建低功能中的手。为了解决有效的注意体系结构中的阻塞和相互作用问题,我们在本文中提出了三个移动注意模块。第一个模块是一个轻巧的特征注意模块,该模块以粗到精细的方式提取局部遮挡表示和全局图像补丁表示。第二个模块是横图和图形桥模块,该模块融合了图像上下文和手顶点。第三个模块是一种轻巧的跨注意机制,该机制使用元素的操作进行线性复杂性的两只手的交叉注意。与最先进的模型相比,最终的模型在交流中达到了可比的2.6m基准。同时,它将拖鞋降低到$ 0.47Gflops $,而最先进的型号的计算在$ 10gflops $ $至20Gflops $之间。

Recent years have witnessed great success for hand reconstruction in real-time applications such as visual reality and augmented reality while interacting with two-hand reconstruction through efficient transformers is left unexplored. In this paper, we propose a method called lightweight attention hand (LWA-HAND) to reconstruct hands in low flops from a single RGB image. To solve the occlusion and interaction problem in efficient attention architectures, we propose three mobile attention modules in this paper. The first module is a lightweight feature attention module that extracts both local occlusion representation and global image patch representation in a coarse-to-fine manner. The second module is a cross image and graph bridge module which fuses image context and hand vertex. The third module is a lightweight cross-attention mechanism that uses element-wise operation for the cross-attention of two hands in linear complexity. The resulting model achieves comparable performance on the InterHand2.6M benchmark in comparison with the state-of-the-art models. Simultaneously, it reduces the flops to $0.47GFlops$ while the state-of-the-art models have heavy computations between $10GFlops$ and $20GFlops$.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源