培训神经网络的行动最少的原则

论文标题

培训神经网络的行动最少的原则

A Principle of Least Action for the Training of Neural Networks

论文作者

Karkar, Skander, Ayed, Ibrahim, de Bézenac, Emmanuel, Gallinari, Patrick

论文摘要

尽管高度高度参数化，但神经网络仍在许多任务上取得了高度的概括性能。由于经典的统计学习理论努力解释这种行为，因此最近努力集中在揭示其背后的机制上，以期开发一个更充分的理论框架并更好地控制训练有素的模型。在这项工作中，我们采用了另一种视角，将神经网络视为一个动态系统，随着时间的推移将输入粒子取代。我们进行了一系列实验，并通过通过其位移来分析网络的行为，在网络的传输图中显示了低动能位移偏置的存在，并将这种偏见与概括性能联系起来。从这一观察结果，我们将学习问题重新制定如下：找到解决任务的神经网络，同时尽可能有效地运输数据。这提供了学习问题的新颖表述，使我们能够基于最佳运输理论为解决方案网络提供规律性结果。从实际的角度来看，这使我们能够提出一种新的学习算法，该算法会自动适应给定任务的复杂性，并导致甚至在低数据制度中，具有高概括能力的网络。

Neural networks have been achieving high generalization performance on many tasks despite being highly over-parameterized. Since classical statistical learning theory struggles to explain this behavior, much effort has recently been focused on uncovering the mechanisms behind it, in the hope of developing a more adequate theoretical framework and having a better control over the trained models. In this work, we adopt an alternate perspective, viewing the neural network as a dynamical system displacing input particles over time. We conduct a series of experiments and, by analyzing the network's behavior through its displacements, we show the presence of a low kinetic energy displacement bias in the transport map of the network, and link this bias with generalization performance. From this observation, we reformulate the learning problem as follows: finding neural networks which solve the task while transporting the data as efficiently as possible. This offers a novel formulation of the learning problem which allows us to provide regularity results for the solution network, based on Optimal Transport theory. From a practical viewpoint, this allows us to propose a new learning algorithm, which automatically adapts to the complexity of the given task, and leads to networks with a high generalization ability even in low data regimes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题