论文标题
培训神经网络的行动最少的原则
A Principle of Least Action for the Training of Neural Networks
论文作者
论文摘要
尽管高度高度参数化,但神经网络仍在许多任务上取得了高度的概括性能。由于经典的统计学习理论努力解释这种行为,因此最近努力集中在揭示其背后的机制上,以期开发一个更充分的理论框架并更好地控制训练有素的模型。在这项工作中,我们采用了另一种视角,将神经网络视为一个动态系统,随着时间的推移将输入粒子取代。我们进行了一系列实验,并通过通过其位移来分析网络的行为,在网络的传输图中显示了低动能位移偏置的存在,并将这种偏见与概括性能联系起来。从这一观察结果,我们将学习问题重新制定如下:找到解决任务的神经网络,同时尽可能有效地运输数据。这提供了学习问题的新颖表述,使我们能够基于最佳运输理论为解决方案网络提供规律性结果。从实际的角度来看,这使我们能够提出一种新的学习算法,该算法会自动适应给定任务的复杂性,并导致甚至在低数据制度中,具有高概括能力的网络。
Neural networks have been achieving high generalization performance on many tasks despite being highly over-parameterized. Since classical statistical learning theory struggles to explain this behavior, much effort has recently been focused on uncovering the mechanisms behind it, in the hope of developing a more adequate theoretical framework and having a better control over the trained models. In this work, we adopt an alternate perspective, viewing the neural network as a dynamical system displacing input particles over time. We conduct a series of experiments and, by analyzing the network's behavior through its displacements, we show the presence of a low kinetic energy displacement bias in the transport map of the network, and link this bias with generalization performance. From this observation, we reformulate the learning problem as follows: finding neural networks which solve the task while transporting the data as efficiently as possible. This offers a novel formulation of the learning problem which allows us to provide regularity results for the solution network, based on Optimal Transport theory. From a practical viewpoint, this allows us to propose a new learning algorithm, which automatically adapts to the complexity of the given task, and leads to networks with a high generalization ability even in low data regimes.