视频：视频分类的半监督学习

论文标题

视频：视频分类的半监督学习

VideoSSL: Semi-Supervised Learning for Video Classification

论文作者

Jing, Longlong, Parag, Toufiq, Wu, Zhe, Tian, Yingli, Wang, Hongcheng

论文摘要

我们使用卷积神经网络（CNN）提出了一种半监督的学习方法，以进行视频分类，Videossl。像其他计算机视觉任务一样，现有的监督视频分类方法需要大量标记的数据以获得良好的性能。但是，大型数据集的注释既昂贵又耗时。为了最大程度地减少对大注释数据集的依赖性，我们提出的半监督方法训练来自少数标记的示例，并从未标记的数据中利用两个调节信号。第一个信号是根据受过训练的CNN的信心计算出的未标记示例的伪标记。另一种是图像分类器CNN预测的归一化概率，该概率捕获了视频中有趣对象的外观的信息。我们表明，在未标记的示例的这些指导信号的监督下，视频分类CNN可以在三个公开可用的数据集上使用一小部分带注释的示例来实现令人印象深刻的性能：UCF101，HMDB51和动力学。

We propose a semi-supervised learning approach for video classification, VideoSSL, using convolutional neural networks (CNN). Like other computer vision tasks, existing supervised video classification methods demand a large amount of labeled data to attain good performance. However, annotation of a large dataset is expensive and time consuming. To minimize the dependence on a large annotated dataset, our proposed semi-supervised method trains from a small number of labeled examples and exploits two regulatory signals from unlabeled data. The first signal is the pseudo-labels of unlabeled examples computed from the confidences of the CNN being trained. The other is the normalized probabilities, as predicted by an image classifier CNN, that captures the information about appearances of the interesting objects in the video. We show that, under the supervision of these guiding signals from unlabeled examples, a video classification CNN can achieve impressive performances utilizing a small fraction of annotated examples on three publicly available datasets: UCF101, HMDB51 and Kinetics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题