论文标题
在浅,非线性自动编码器中表示表示的动力学
The dynamics of representation learning in shallow, non-linear autoencoders
论文作者
论文摘要
自动编码器是无监督学习的最简单神经网络,因此是研究特征学习的理想框架。尽管最近已经获得了对线性自动编码器的动力学的详细理解,但由于技术难度与非平凡相关性处理训练数据的技术难度阻碍了非线性自动编码器的研究 - 这是特征提取的基本先决条件。在这里,我们研究了非线性,浅自动编码器中特征学习的动态。我们得出了一组渐近精确的方程式,这些方程描述了在高维输入限制中以随机梯度下降(SGD)训练的自动化装置的概括动力学。这些方程式表明,自动编码器会顺序学习其输入的主要主要成分。对长期动力学的分析解释了Sigmoidal自动编码器无法以绑扎权重学习的失败,并突出了训练Relu自动编码器中偏见的重要性。在线性网络的先前结果的基础上,我们分析了允许学习确切主要组件的香草SGD算法的修改。最后,我们表明我们的方程式准确地描述了在诸如CIFAR10之类的现实数据集上非线性自动编码器的概括动力学。
Autoencoders are the simplest neural network for unsupervised learning, and thus an ideal framework for studying feature learning. While a detailed understanding of the dynamics of linear autoencoders has recently been obtained, the study of non-linear autoencoders has been hindered by the technical difficulty of handling training data with non-trivial correlations - a fundamental prerequisite for feature extraction. Here, we study the dynamics of feature learning in non-linear, shallow autoencoders. We derive a set of asymptotically exact equations that describe the generalisation dynamics of autoencoders trained with stochastic gradient descent (SGD) in the limit of high-dimensional inputs. These equations reveal that autoencoders learn the leading principal components of their inputs sequentially. An analysis of the long-time dynamics explains the failure of sigmoidal autoencoders to learn with tied weights, and highlights the importance of training the bias in ReLU autoencoders. Building on previous results for linear networks, we analyse a modification of the vanilla SGD algorithm which allows learning of the exact principal components. Finally, we show that our equations accurately describe the generalisation dynamics of non-linear autoencoders on realistic datasets such as CIFAR10.