论文标题
视频:重新思考视频分类的数据增强
VideoMix: Rethinking Data Augmentation for Video Classification
论文作者
论文摘要
最先进的视频动作分类器通常会遭受过度拟合的痛苦。它们往往会偏向特定的对象和场景提示,而不是前景动作内容,从而导致次级概括性能。据报道,最近的数据增强策略可以解决静态图像分类器中的过度拟合问题。尽管对静态图像分类器具有有效性,但很少对视频进行数据增强。我们在现场第一次,我们系统地分析了视频分类任务的各种数据增强策略的功效。然后,我们提出了一个强大的增强策略视频。 Videomix通过将视频核心插入另一个视频来创建新的培训视频。地面真相标签与每个视频的体素数量成比例混合。我们表明,视频将模型超越对象和场景偏见,并提取更健壮的线索以进行动作识别。 Videomix始终优于动力学和具有挑战性的V2基准测试的其他增强基线。它还改善了弱监督的行动定位性能在Thumos'14上。视频预处理的模型在视频检测任务(AVA)方面具有改进的精度。
State-of-the-art video action classifiers often suffer from overfitting. They tend to be biased towards specific objects and scene cues, rather than the foreground action content, leading to sub-optimal generalization performances. Recent data augmentation strategies have been reported to address the overfitting problems in static image classifiers. Despite the effectiveness on the static image classifiers, data augmentation has rarely been studied for videos. For the first time in the field, we systematically analyze the efficacy of various data augmentation strategies on the video classification task. We then propose a powerful augmentation strategy VideoMix. VideoMix creates a new training video by inserting a video cuboid into another video. The ground truth labels are mixed proportionally to the number of voxels from each video. We show that VideoMix lets a model learn beyond the object and scene biases and extract more robust cues for action recognition. VideoMix consistently outperforms other augmentation baselines on Kinetics and the challenging Something-Something-V2 benchmarks. It also improves the weakly-supervised action localization performance on THUMOS'14. VideoMix pretrained models exhibit improved accuracies on the video detection task (AVA).