公路变压器：自我控制增强的自我牵键网络

论文标题

公路变压器：自我控制增强的自我牵键网络

Highway Transformer: Self-Gating Enhanced Self-Attentive Networks

论文作者

Chai, Yekun, Jin, Shuo, Hou, Xinwen

论文摘要

自我发挥的机制使各种序列学习任务的最先进（SOTA）取得了惊人的进步，通过参与不同位置的所有全球环境，站在多头点产品的关注上。通过伪信息高速公路，我们引入了一个封闭式组件自依赖性单元（SDU），该单元（SDU）结合了LSTM风格的门控单元，以在单个表示的多维潜在空间内补充内部语义重要性。基于辅助内容的SDU门可以通过跳过连接的调制潜在嵌入式的信息流，从而导致与梯度下降算法的明确收敛速度。我们可能会揭示门控机制在基于上下文的变压器模块中的作用，并假设SDU门，尤其是在浅层层上，可以更快地推动它在优化过程中迈向次优点。

Self-attention mechanisms have made striking state-of-the-art (SOTA) progress in various sequence learning tasks, standing on the multi-headed dot product attention by attending to all the global contexts at different locations. Through a pseudo information highway, we introduce a gated component self-dependency units (SDU) that incorporates LSTM-styled gating units to replenish internal semantic importance within the multi-dimensional latent space of individual representations. The subsidiary content-based SDU gates allow for the information flow of modulated latent embeddings through skipped connections, leading to a clear margin of convergence speed with gradient descent algorithms. We may unveil the role of gating mechanism to aid in the context-based Transformer modules, with hypothesizing that SDU gates, especially on shallow layers, could push it faster to step towards suboptimal points during the optimization process.

下载PDF全文

下载文献需遵守相关版权规定

论文标题