论文标题
etad:训练动作检测在笔记本电脑上端到头
ETAD: Training Action Detection End to End on a Laptop
论文作者
论文摘要
端到端培训的时间动作检测(TAD)通常会因长期视频持续时间而遭受对计算资源需求的痛苦。在这项工作中,我们提出了一个有效的时间动作检测器(ETAD),该检测器可以直接从GPU内存消耗极低的视频框架中训练。我们的主要思想是最大程度地减少和平衡每个训练迭代中功能和梯度之间的重度计算。我们建议通过视频编码器依次将片段框架转发,而仅向后向后倾斜以更新编码器的必要部分。为了进一步缓解培训中的计算冗余,我们建议在培训期间动态采样一小部分建议。此外,研究了编码器和检测器的各种抽样策略和比率。 Etad在TAD基准测试基准上实现了出色的效率。在ActivityNet-1.3上,在18小时内训练ETAD可以达到平均地图38.25%,在端到端培训下,每个视频仅1.3 GB的内存消耗。我们的代码将公开发布。
Temporal action detection (TAD) with end-to-end training often suffers from the pain of huge demand for computing resources due to long video duration. In this work, we propose an efficient temporal action detector (ETAD) that can train directly from video frames with extremely low GPU memory consumption. Our main idea is to minimize and balance the heavy computation among features and gradients in each training iteration. We propose to sequentially forward the snippet frame through the video encoder, and backward only a small necessary portion of gradients to update the encoder. To further alleviate the computational redundancy in training, we propose to dynamically sample only a small subset of proposals during training. Moreover, various sampling strategies and ratios are studied for both the encoder and detector. ETAD achieves state-of-the-art performance on TAD benchmarks with remarkable efficiency. On ActivityNet-1.3, training ETAD in 18 hours can reach 38.25% average mAP with only 1.3 GB memory consumption per video under end-to-end training. Our code will be publicly released.