深层神经网络的求解器 +梯度下降训练算法

论文标题

深层神经网络的求解器 +梯度下降训练算法

A Solver + Gradient Descent Training Algorithm for Deep Neural Networks

论文作者

Ashok, Dhananjay, Nagisetty, Vineel, Srinivasa, Christopher, Ganesh, Vijay

论文摘要

我们提出了一种用于训练深神经网络的新型混合算法，该算法将最先进的梯度下降（GD）方法与混合整数线性编程（MILP）求解器，以准确性以及回归和分类任务的资源和数据效率方面相结合。我们的GD+溶液混合算法（称为GDOLVER）如下：鉴于DNN $ D $作为输入，Gdsolver调用GDOK召集GD部分训练$ D $，直到将其陷入当地的微调，在这一点上，GDSOLVER在这一点上，通过索取损失的目标，以$ d $ d $ d $ d $ d $ d $ d $ D，逃脱当地的最小值。重复该过程直到达到所需的准确性。在我们的实验中，我们发现Gdsolver不仅可以很好地扩展到其他数据和非常大的模型尺寸，而且还优于所有其他竞争方法，从收敛和数据效率率方面。对于回归任务，GDOLVER产生的模型平均在少48％的时间内降低了31.5％，并且对于MNIST和CIFAR10的分类任务，Gdsolver能够使用GD基准线所需的50％的训练数据，才能实现所有竞争方法的最高精度。

We present a novel hybrid algorithm for training Deep Neural Networks that combines the state-of-the-art Gradient Descent (GD) method with a Mixed Integer Linear Programming (MILP) solver, outperforming GD and variants in terms of accuracy, as well as resource and data efficiency for both regression and classification tasks. Our GD+Solver hybrid algorithm, called GDSolver, works as follows: given a DNN $D$ as input, GDSolver invokes GD to partially train $D$ until it gets stuck in a local minima, at which point GDSolver invokes an MILP solver to exhaustively search a region of the loss landscape around the weight assignments of $D$'s final layer parameters with the goal of tunnelling through and escaping the local minima. The process is repeated until desired accuracy is achieved. In our experiments, we find that GDSolver not only scales well to additional data and very large model sizes, but also outperforms all other competing methods in terms of rates of convergence and data efficiency. For regression tasks, GDSolver produced models that, on average, had 31.5% lower MSE in 48% less time, and for classification tasks on MNIST and CIFAR10, GDSolver was able to achieve the highest accuracy over all competing methods, using only 50% of the training data that GD baselines required.

下载PDF全文

下载文献需遵守相关版权规定

论文标题