关于非线性重量更新的好处

论文标题

关于非线性重量更新的好处

On the benefits of non-linear weight updates

论文作者

Norridge, Paul

论文摘要

最近的工作表明，DNN的概括性能与在每个节点上优化信噪比的程度有关。相比之下，梯度下降方法并不总是导致SNR优势配置。提高SNR性能的一种方法是抑制重量更新并扩大少量重量更新。这种平衡已经在某些常见的优化器中隐含了，但是我们提出了一种使这种明确的方法。在进行DNN参数更新之前，该方法将非线性函数应用于梯度。我们通过这种非线性方法研究了性能。结果是对现有优化器的改编，可以改善许多问题类型的性能。

Recent work has suggested that the generalisation performance of a DNN is related to the extent to which the Signal-to-Noise Ratio is optimised at each of the nodes. In contrast, Gradient Descent methods do not always lead to SNR-optimal weight configurations. One way to improve SNR performance is to suppress large weight updates and amplify small weight updates. Such balancing is already implicit in some common optimizers, but we propose an approach that makes this explicit. The method applies a non-linear function to gradients prior to making DNN parameter updates. We investigate the performance with such non-linear approaches. The result is an adaptation to existing optimizers that improves performance for many problem types.

下载PDF全文

下载文献需遵守相关版权规定

论文标题