在神经激活的零空间中，稀疏性和异质辍学用于连续学习

论文标题

在神经激活的零空间中，稀疏性和异质辍学用于连续学习

Sparsity and Heterogeneous Dropout for Continual Learning in the Null Space of Neural Activations

论文作者

Abbasi, Ali, Nooralinejad, Parsa, Braverman, Vladimir, Pirsiavash, Hamed, Kolouri, Soheil

论文摘要

从非平稳输入数据流中持续/终身学习是情报的基石。尽管在各种应用中表现出色，但深层神经网络仍容易在学习新信息时忘记他们以前学习的信息。这种现象称为“灾难性遗忘”，并深深地植根于稳定性困境。近年来，克服深层神经网络中的灾难性遗忘已成为一个积极的研究领域。特别是，基于梯度投射的方法最近在克服灾难性遗忘时表现出了出色的表现。本文提出了基于稀疏性和异质辍学的两种受生物学启发的机制，这些机制大大提高了持续学习者的表现，而不是长期的任务。我们提出的方法建立在梯度投影存储器（GPM）框架上。我们利用神经网络的每一层中的K-Winner激活来为每个任务执行层次稀疏激活，以及任务间的异质辍学，鼓励网络在不同任务之间使用非重叠的激活模式。此外，我们引入了两个新的基准，用于在分配变化下连续学习，即连续的瑞士卷和Imagenet Superdog-40。最后，我们对我们提出的方法进行了深入的分析，并证明了各种基准持续学习问题的显着绩效。

Continual/lifelong learning from a non-stationary input data stream is a cornerstone of intelligence. Despite their phenomenal performance in a wide variety of applications, deep neural networks are prone to forgetting their previously learned information upon learning new ones. This phenomenon is called "catastrophic forgetting" and is deeply rooted in the stability-plasticity dilemma. Overcoming catastrophic forgetting in deep neural networks has become an active field of research in recent years. In particular, gradient projection-based methods have recently shown exceptional performance at overcoming catastrophic forgetting. This paper proposes two biologically-inspired mechanisms based on sparsity and heterogeneous dropout that significantly increase a continual learner's performance over a long sequence of tasks. Our proposed approach builds on the Gradient Projection Memory (GPM) framework. We leverage k-winner activations in each layer of a neural network to enforce layer-wise sparse activations for each task, together with a between-task heterogeneous dropout that encourages the network to use non-overlapping activation patterns between different tasks. In addition, we introduce two new benchmarks for continual learning under distributional shift, namely Continual Swiss Roll and ImageNet SuperDog-40. Lastly, we provide an in-depth analysis of our proposed method and demonstrate a significant performance boost on various benchmark continual learning problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题