论文标题
双彩票票证假设
Dual Lottery Ticket Hypothesis
论文作者
论文摘要
充分利用神经网络的学习能力需要过度参数密集的网络。另一方面,直接训练稀疏神经网络通常会导致性能不令人满意。彩票假说(LTH)提供了一种新的观点,以研究稀疏网络培训并保持其能力。具体而言,它声称,通过迭代级级修剪和保留有希望的训练性(或我们说处于可训练状态),从随机初始化的网络中获胜。 In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark, then go from a complementary direction to articulate the Dual Lottery Ticket Hypothesis (DLTH): Randomly selected subnetworks from a randomly initialized dense network can be transformed into a trainable condition and achieve admirable performance compared with LTH -- random tickets in a given lottery pool can be transformed into winning tickets.具体而言,通过使用统一选择的子网表示一般情况,我们提出了一种简单的稀疏网络训练策略,随机稀疏网络变换(RST)来证实我们的DLTH。具体而言,我们引入了一个正规化术语,以借用学习能力,并从将掩盖的权重中实现信息挤出。完成随机选择子网的转换后,我们使用与LTH和其他强基础的公平比较进行了常规登录以评估模型。在几个公共数据集上进行了广泛的实验,并与竞争方法进行了比较验证了我们的DLTH以及提议的模型RST的有效性。预计我们的工作将铺平一种方式,以激发未来稀疏网络培训的新研究方向。我们的代码可在https://github.com/yueb17/dlth上找到。
Fully exploiting the learning capacity of neural networks requires overparameterized dense networks. On the other side, directly training sparse neural networks typically results in unsatisfactory performance. Lottery Ticket Hypothesis (LTH) provides a novel view to investigate sparse network training and maintain its capacity. Concretely, it claims there exist winning tickets from a randomly initialized network found by iterative magnitude pruning and preserving promising trainability (or we say being in trainable condition). In this work, we regard the winning ticket from LTH as the subnetwork which is in trainable condition and its performance as our benchmark, then go from a complementary direction to articulate the Dual Lottery Ticket Hypothesis (DLTH): Randomly selected subnetworks from a randomly initialized dense network can be transformed into a trainable condition and achieve admirable performance compared with LTH -- random tickets in a given lottery pool can be transformed into winning tickets. Specifically, by using uniform-randomly selected subnetworks to represent the general cases, we propose a simple sparse network training strategy, Random Sparse Network Transformation (RST), to substantiate our DLTH. Concretely, we introduce a regularization term to borrow learning capacity and realize information extrusion from the weights which will be masked. After finishing the transformation for the randomly selected subnetworks, we conduct the regular finetuning to evaluate the model using fair comparisons with LTH and other strong baselines. Extensive experiments on several public datasets and comparisons with competitive approaches validate our DLTH as well as the effectiveness of the proposed model RST. Our work is expected to pave a way for inspiring new research directions of sparse network training in the future. Our code is available at https://github.com/yueb17/DLTH.