深度学习的人工神经变异性：过度拟合，噪音记忆和灾难性遗忘

论文标题

深度学习的人工神经变异性：过度拟合，噪音记忆和灾难性遗忘

Artificial Neural Variability for Deep Learning: On Overfitting, Noise Memorization, and Catastrophic Forgetting

论文作者

Xie, Zeke, He, Fengxiang, Fu, Shaopeng, Sato, Issei, Tao, Dacheng, Sugiyama, Masashi

论文摘要

深度学习通常受到自然神经系统中很少存在的两个严重问题的批评：过度拟合和灾难性的遗忘。它甚至可以记住随机标记的数据，该数据在实例标签对背后几乎没有知识。当深层网络通过容纳新任务不断学习时，它通常会迅速覆盖从以前的任务中学到的知识。被称为{\ it神经变异性}，在神经科学中众所周知，即使对同一刺激的响应，人脑反应也会显示出很大的可变性。这种机制平衡了自然神经系统运动学习中的精度和可塑性/灵活性。因此，它激发了我们设计一种名为{\ it人工神经变异性}（ANV）的类似机制，该机制有助于人工神经网络从``自然''神经网络中学习一些优势。我们严格地证明，ANV是培训数据和学习模型之间相互信息的隐式正规化程序。从理论上讲，这一结果可确保ANV严格提高了概括性，对标签噪声的鲁棒性以及对灾难性遗忘的鲁棒性。然后，我们设计一个{\ it神经变量最小化}（NVRM）框架和{\ it神经变量优化器}，以实现实践中常规网络体系结构的ANV。实证研究表明，NVRM可以有效缓解过度拟合，标签噪声记忆以及灾难性的遗忘，而遗忘的成本可以忽略不计。 \ footNote {代码：\ url {https://github.com/zeke-xie/artcover-neural-variability-for-deep-learning}。

Deep learning is often criticized by two serious issues which rarely exist in natural nervous systems: overfitting and catastrophic forgetting. It can even memorize randomly labelled data, which has little knowledge behind the instance-label pairs. When a deep network continually learns over time by accommodating new tasks, it usually quickly overwrites the knowledge learned from previous tasks. Referred to as the {\it neural variability}, it is well-known in neuroscience that human brain reactions exhibit substantial variability even in response to the same stimulus. This mechanism balances accuracy and plasticity/flexibility in the motor learning of natural nervous systems. Thus it motivates us to design a similar mechanism named {\it artificial neural variability} (ANV), which helps artificial neural networks learn some advantages from ``natural'' neural networks. We rigorously prove that ANV plays as an implicit regularizer of the mutual information between the training data and the learned model. This result theoretically guarantees ANV a strictly improved generalizability, robustness to label noise, and robustness to catastrophic forgetting. We then devise a {\it neural variable risk minimization} (NVRM) framework and {\it neural variable optimizers} to achieve ANV for conventional network architectures in practice. The empirical studies demonstrate that NVRM can effectively relieve overfitting, label noise memorization, and catastrophic forgetting at negligible costs. \footnote{Code: \url{https://github.com/zeke-xie/artificial-neural-variability-for-deep-learning}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题