0/1通过块坐标下降的深神经网络

论文标题

0/1通过块坐标下降的深神经网络

0/1 Deep Neural Networks via Block Coordinate Descent

论文作者

Zhang, Hui, Zhou, Shenglong, Li, Geoffrey Ye, Xiu, Naihua

论文摘要

步骤函数是深神经网络（DNN）最简单，最自然的激活函数之一。由于它计算为1的正变量，而其他变量为0，因此其内在特征（例如，不连续性，没有可行的亚级别信息）会阻碍其几十年来的发展。即使在设计具有连续激活功能的DNN方面有令人印象深刻的工作，可以被视为步骤函数的替代物，但它仍然具有某些有利的特性，例如对异常值的完全稳健性，并能够获得最佳的学习理论预测准确性的学习理论保证。因此，在本文中，我们旨在训练DNNs用作激活函数（称为0/1 DNNS）的DNN。我们首先将0/1 dnns重新加密为无约束的优化问题，然后通过块坐标下降（BCD）方法解决它。此外，我们为BCD的子问题及其收敛性获得了封闭式解决方案。此外，我们还将$ \ ell_ {2,0} $ - 正则化整合到0/1 DNN中，以加速训练过程并压缩网络量表。结果，所提出的算法在分类MNIST和时尚数据集方面具有高性能。结果，所提出的算法在分类MNIST，FashionMnist，CIFAR10和CIFAR100数据集方面具有理想的性能。

The step function is one of the simplest and most natural activation functions for deep neural networks (DNNs). As it counts 1 for positive variables and 0 for others, its intrinsic characteristics (e.g., discontinuity and no viable information of subgradients) impede its development for several decades. Even if there is an impressive body of work on designing DNNs with continuous activation functions that can be deemed as surrogates of the step function, it is still in the possession of some advantageous properties, such as complete robustness to outliers and being capable of attaining the best learning-theoretic guarantee of predictive accuracy. Hence, in this paper, we aim to train DNNs with the step function used as an activation function (dubbed as 0/1 DNNs). We first reformulate 0/1 DNNs as an unconstrained optimization problem and then solve it by a block coordinate descend (BCD) method. Moreover, we acquire closed-form solutions for sub-problems of BCD as well as its convergence properties. Furthermore, we also integrate $\ell_{2,0}$-regularization into 0/1 DNN to accelerate the training process and compress the network scale. As a result, the proposed algorithm has a high performance on classifying MNIST and Fashion-MNIST datasets. As a result, the proposed algorithm has a desirable performance on classifying MNIST, FashionMNIST, Cifar10, and Cifar100 datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题