打破时间不变性：RNN的分类时间归一化

论文标题

打破时间不变性：RNN的分类时间归一化

Breaking Time Invariance: Assorted-Time Normalization for RNNs

论文作者

Pospisil, Cole, Zadorozhnyy, Vasily, Ye, Qiang

论文摘要

事实证明，诸如层归一化（LN）和批处理（BN）之类的方法可有效改善复发性神经网络（RNN）的训练。但是，现有方法仅在一个特定的时间步骤中仅使用瞬时信息进行归一化，而归一化的结果是具有时间无关分布的预催化状态。该实现无法解释RNN的输入和体系结构中固有的某些时间差异。由于这些网络跨时间步骤共享权重，因此也可能需要考虑标准化方案中时间步长之间的连接。在本文中，我们提出了一种称为“分类时间归一化”（ATN）的归一化方法，该方法保留了来自多个连续时间步骤的信息，并使用它们归一化。这种设置使我们能够在不引入任何可训练的参数的情况下将更长的时间依赖项引入传统的归一化方法。我们介绍了梯度传播的理论推导，并证明了权重缩放不变属性。我们将ATN应用于LN的实验表明，在各种任务上有一致的改进，例如添加，复制和DENOISE问题和语言建模问题。

Methods such as Layer Normalization (LN) and Batch Normalization (BN) have proven to be effective in improving the training of Recurrent Neural Networks (RNNs). However, existing methods normalize using only the instantaneous information at one particular time step, and the result of the normalization is a preactivation state with a time-independent distribution. This implementation fails to account for certain temporal differences inherent in the inputs and the architecture of RNNs. Since these networks share weights across time steps, it may also be desirable to account for the connections between time steps in the normalization scheme. In this paper, we propose a normalization method called Assorted-Time Normalization (ATN), which preserves information from multiple consecutive time steps and normalizes using them. This setup allows us to introduce longer time dependencies into the traditional normalization methods without introducing any new trainable parameters. We present theoretical derivations for the gradient propagation and prove the weight scaling invariance property. Our experiments applying ATN to LN demonstrate consistent improvement on various tasks, such as Adding, Copying, and Denoise Problems and Language Modeling Problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题