论文标题
NET2NET转换的节能和稳健的累积训练
Energy-efficient and Robust Cumulative Training with Net2Net Transformation
论文作者
论文摘要
深度学习已经在几个计算机视觉任务上实现了最新的精度。但是,与训练这样的深神经网络相关的计算和能量需求可能很高。在本文中,与从头开始训练的模型相比,我们提出了一种通过Net2NET转换的累积培训策略,可实现训练计算效率而不会产生大量准确性损失。我们通过首先在原始数据集的一小部分训练一个小网络(具有较小参数)来实现这一目标,然后使用Net2net变换逐渐扩展网络,以逐步训练数据集的较大子集。使用Net2NET的这种增量培训策略利用了功能保护转换,将知识从每个以前的小型网络转移到下一个较大的网络,从而降低了整体培训的复杂性。我们的实验表明,与从头开始的训练相比,使用VGG19使用ISO准率,累积训练的计算复杂性降低了约2倍。除了训练效率外,我们的累积训练策略的关键优势是,与从头开始的常规培训相比,我们可以在Net2NET扩展过程中进行修剪以获得最佳配置(推理计算降低〜0.4倍)的最终网络。我们还证明,从累积训练获得的最终网络可产生更好的概括性能和噪声稳健性。此外,我们表明,来自累积Net2NET扩展创建的所有网络的相互推断可以改善对抗输入检测。
Deep learning has achieved state-of-the-art accuracies on several computer vision tasks. However, the computational and energy requirements associated with training such deep neural networks can be quite high. In this paper, we propose a cumulative training strategy with Net2Net transformation that achieves training computational efficiency without incurring large accuracy loss, in comparison to a model trained from scratch. We achieve this by first training a small network (with lesser parameters) on a small subset of the original dataset, and then gradually expanding the network using Net2Net transformation to train incrementally on larger subsets of the dataset. This incremental training strategy with Net2Net utilizes function-preserving transformations that transfers knowledge from each previous small network to the next larger network, thereby, reducing the overall training complexity. Our experiments demonstrate that compared with training from scratch, cumulative training yields ~2x reduction in computational complexity for training TinyImageNet using VGG19 at iso-accuracy. Besides training efficiency, a key advantage of our cumulative training strategy is that we can perform pruning during Net2Net expansion to obtain a final network with optimal configuration (~0.4x lower inference compute complexity) compared to conventional training from scratch. We also demonstrate that the final network obtained from cumulative training yields better generalization performance and noise robustness. Further, we show that mutual inference from all the networks created with cumulative Net2Net expansion enables improved adversarial input detection.