关于自然语言处理动态神经网络的调查

论文标题

关于自然语言处理动态神经网络的调查

A Survey on Dynamic Neural Networks for Natural Language Processing

论文作者

Xu, Canwen, McAuley, Julian

论文摘要

有效地扩展大型变压器模型是自然语言处理最新进展的主要驱动力。动态神经网络作为一个新兴的研究方向，能够通过基于输入动态调整其计算路径来扩大计算和时间的缩放神经网络。动态神经网络可能是对验证语言模型不断增长的参数数量的有前途的解决方案，从而允许对数万亿个参数进行预处理，并且可以更快地推断移动设备。在这项调查中，我们总结了NLP中三种动态神经网络的进展：浏览，专家的混合物和提前退出。我们还强调了当前的动态神经网络和未来研究方向的挑战。

Effectively scaling large Transformer models is a main driver of recent advances in natural language processing. Dynamic neural networks, as an emerging research direction, are capable of scaling up neural networks with sub-linear increases in computation and time by dynamically adjusting their computational path based on the input. Dynamic neural networks could be a promising solution to the growing parameter numbers of pretrained language models, allowing both model pretraining with trillions of parameters and faster inference on mobile devices. In this survey, we summarize progress of three types of dynamic neural networks in NLP: skimming, mixture of experts, and early exit. We also highlight current challenges in dynamic neural networks and directions for future research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题