通过非线性控制理论，深残留神经网络的通用近似能力

论文标题

通过非线性控制理论，深残留神经网络的通用近似能力

Universal Approximation Power of Deep Residual Neural Networks via Nonlinear Control Theory

论文作者

Tabuada, Paulo, Gharesifard, Bahman

论文摘要

在本文中，我们通过几何非线性控制来解释深残留神经网络的通用近似能力。受到最新工作的启发，我们通过询问激活函数或其衍生物之一来满足二次微分方程，为残留网络提供了具有通用近似功能的一般条件。实践中使用的许多激活功能都完全或大致满足了这一假设，我们表明该属性足以足以在每层$ n+1 $神经元的充分深度神经网络上，在紧凑的集合上近似于任意近似于超级范围，并且在上超级规范方面，任何连续的功能，$ \ nath $ \ nath $}^n $ to {r}^n $ to $}^n $ to $ \ $ \ nath $ \ n $} $}。我们进一步显示了这个结果，可以为非常简单的体系结构而保存，而权重只需要假设两个值。第一个关键的技术贡献包括将通用近似问题与与残留网络相对应的控制系统的可控性联系起来，并利用经典的谎言代数技术来表征可控性。第二种技术贡献是将单调性确定为有限合奏的可控性与紧凑型集合均匀近似性之间的桥梁。

In this paper, we explain the universal approximation capabilities of deep residual neural networks through geometric nonlinear control. Inspired by recent work establishing links between residual networks and control systems, we provide a general sufficient condition for a residual network to have the power of universal approximation by asking the activation function, or one of its derivatives, to satisfy a quadratic differential equation. Many activation functions used in practice satisfy this assumption, exactly or approximately, and we show this property to be sufficient for an adequately deep neural network with $n+1$ neurons per layer to approximate arbitrarily well, on a compact set and with respect to the supremum norm, any continuous function from $\mathbb{R}^n$ to $\mathbb{R}^n$. We further show this result to hold for very simple architectures for which the weights only need to assume two values. The first key technical contribution consists of relating the universal approximation problem to controllability of an ensemble of control systems corresponding to a residual network and to leverage classical Lie algebraic techniques to characterize controllability. The second technical contribution is to identify monotonicity as the bridge between controllability of finite ensembles and uniform approximability on compact sets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题