避免内核固定点：使用ELU和GELU无限网络计算

论文标题

避免内核固定点：使用ELU和GELU无限网络计算

Avoiding Kernel Fixed Points: Computing with ELU and GELU Infinite Networks

论文作者

Tsuchida, Russell, Pearce, Tim, van der Heide, Chris, Roosta, Fred, Gallagher, Marcus

论文摘要

通过无限广泛的神经网络引起的高斯过程分析和计算最近流行了。尽管如此，具有现代网络中使用的激活功能的网络的许多显式协方差函数仍然未知。此外，尽管可以迭代地计算深网的内核，但缺乏对深内核的理论理解，尤其是在固定点动力学方面。首先，我们得出具有指数线性单元（ELU）和高斯误差线性单元（GELU）的多层感知器（MLP）的协方差函数，并评估了某些基准测试的限制高斯过程的性能。其次，更一般而言，我们分析了与广泛激活函数相对应的迭代核的固定点动力学。我们发现，与一些先前研究的神经网络内核不同，这些新内核表现出非平凡的固定点动力学，这些动力学在有限宽度的神经网络中反映出来。某些网络中存在的固定点行为解释了过度参数化的深层模型中隐式正则化的机制。我们的结果涉及静态IID参数共轭核和动态神经切线核构建体。 github.com/russelltsuchida/elu_gelu_kernels上的软件。

Analysing and computing with Gaussian processes arising from infinitely wide neural networks has recently seen a resurgence in popularity. Despite this, many explicit covariance functions of networks with activation functions used in modern networks remain unknown. Furthermore, while the kernels of deep networks can be computed iteratively, theoretical understanding of deep kernels is lacking, particularly with respect to fixed-point dynamics. Firstly, we derive the covariance functions of multi-layer perceptrons (MLPs) with exponential linear units (ELU) and Gaussian error linear units (GELU) and evaluate the performance of the limiting Gaussian processes on some benchmarks. Secondly, and more generally, we analyse the fixed-point dynamics of iterated kernels corresponding to a broad range of activation functions. We find that unlike some previously studied neural network kernels, these new kernels exhibit non-trivial fixed-point dynamics which are mirrored in finite-width neural networks. The fixed point behaviour present in some networks explains a mechanism for implicit regularisation in overparameterised deep models. Our results relate to both the static iid parameter conjugate kernel and the dynamic neural tangent kernel constructions. Software at github.com/RussellTsuchida/ELU_GELU_kernels.

下载PDF全文

下载文献需遵守相关版权规定

论文标题