论文标题
通过乘法更新来学习组成功能
Learning compositional functions via multiplicative weight updates
论文作者
论文摘要
组成性是生物学和人工神经网络的基本结构特征。通过梯度下降学习组成功能会导致众所周知的问题,例如消失和爆炸梯度,使仔细的学习率调整对现实世界应用至关重要。本文证明,乘法重量更新满足了根据组成功能量身定制的下降引理。基于这种引理,我们得出了Adam Optimiser的乘法版本,并表明它可以在无需学习率调整的情况下训练艺术神经网络体系结构的状态。我们进一步表明,女士很容易通过在对数编号系统中代表其权重来训练本质压缩的神经网络。我们通过在乘法重量更新与有关生物学突触的最新发现之间建立联系来结束。
Compositionality is a basic structural feature of both biological and artificial neural networks. Learning compositional functions via gradient descent incurs well known problems like vanishing and exploding gradients, making careful learning rate tuning essential for real-world applications. This paper proves that multiplicative weight updates satisfy a descent lemma tailored to compositional functions. Based on this lemma, we derive Madam -- a multiplicative version of the Adam optimiser -- and show that it can train state of the art neural network architectures without learning rate tuning. We further show that Madam is easily adapted to train natively compressed neural networks by representing their weights in a logarithmic number system. We conclude by drawing connections between multiplicative weight updates and recent findings about synapses in biology.