使用3D-CNN和2D-CNN光学流动运动模板的动态手势识别的两流融合模型

论文标题

使用3D-CNN和2D-CNN光学流动运动模板的动态手势识别的两流融合模型

Two-stream Fusion Model for Dynamic Hand Gesture Recognition using 3D-CNN and 2D-CNN Optical Flow guided Motion Template

论文作者

Sarma, Debajit, Kavyasree, V., Bhuyan, M. K.

论文摘要

对于人类计算机互动界的许多应用程序，使用手势可以是有用的工具。在广泛的领域中，手势技术可以专门用于手语识别，机器人手术等。在手势识别，正确检测和移动手的过程中，由于手的形状和大小各不相同。在这里，目的是跟踪手的运动，无论手的形状，大小和颜色如何。为此，提出了由光流（OFMT）引导的运动模板。 OFMT是编码为单个图像的手势的运动信息的紧凑表示。在实验中，使用裸手和开放式棕榈穿着绿色绿色的手掌的不同数据集，在这两种情况下，我们都可以以相等的精度生成OFMT图像。最近，与传统的基于手工制作的功能技术相比，基于网络的深度技术显示出令人印象深刻的改进。此外，在文献中，可以看出，使用不同的流提供的输入数据有助于提高识别精度的性能。这项工作基本上提出了一个两流融合模型，用于手势识别，并基于光流的紧凑而有效的运动模板。具体而言，两流网络由两层组成：以手势视频为输入的3D卷积神经网络（C3D），将MT图像作为输入的2D-CNN。 C3D显示了其在捕获视频的时空信息方面的效率。而MT有助于消除提供其他运动信息的无关手势。尽管每个流都可以独立工作，但它们与融合方案相结合以提高识别结果。我们已经在两个数据库上展示了所提出的两流网络的效率。

The use of hand gestures can be a useful tool for many applications in the human-computer interaction community. In a broad range of areas hand gesture techniques can be applied specifically in sign language recognition, robotic surgery, etc. In the process of hand gesture recognition, proper detection, and tracking of the moving hand become challenging due to the varied shape and size of the hand. Here the objective is to track the movement of the hand irrespective of the shape, size, and color of the hand. And, for this, a motion template guided by optical flow (OFMT) is proposed. OFMT is a compact representation of the motion information of a gesture encoded into a single image. In the experimentation, different datasets using bare hand with an open palm, and folded palm wearing green-glove are used, and in both cases, we could generate the OFMT images with equal precision. Recently, deep network-based techniques have shown impressive improvements as compared to conventional hand-crafted feature-based techniques. Moreover, in the literature, it is seen that the use of different streams with informative input data helps to increase the performance in the recognition accuracy. This work basically proposes a two-stream fusion model for hand gesture recognition and a compact yet efficient motion template based on optical flow. Specifically, the two-stream network consists of two layers: a 3D convolutional neural network (C3D) that takes gesture videos as input and a 2D-CNN that takes OFMT images as input. C3D has shown its efficiency in capturing spatio-temporal information of a video. Whereas OFMT helps to eliminate irrelevant gestures providing additional motion information. Though each stream can work independently, they are combined with a fusion scheme to boost the recognition results. We have shown the efficiency of the proposed two-stream network on two databases.

下载PDF全文

下载文献需遵守相关版权规定

论文标题