论文标题
Legonn:建筑模块化编码器模型
LegoNN: Building Modular Encoder-Decoder Models
论文作者
论文摘要
最先进的编码器模型(例如,用于机器翻译(MT)或自动语音识别(ASR)),并作为原子单元构造并端到端训练。没有其他模型的任何组件都无法(重新)使用,因此无法共享零件,例如跨任务的高资源解码器。我们描述了Legonn,这是一种以某种方式构建编码器架构的过程,以便将其部分应用于其他任务,而无需进行任何微调。为了实现这种可重复性,编码器和解码器模块之间的界面接地在预定义的离散词汇上的一系列边缘分布序列。我们提出了两种摄入这些边缘的方法。一个是可区分的,可以使整个网络的梯度流动,另一个是梯度分离的。为了在不同源语言的MT任务之间以及在ASR等其他任务之间启用解码器模块的可移植性,我们引入了一种模态不可思议的编码器,该模态编码器由长度控制机制组成,以动态调整编码器的输出长度,以匹配预测预先传达的解码器的预期输入长度范围。我们提出了几项实验来证明Legonn模型的有效性:可以重复使用德国英语(DE-EN)MT任务的训练有素的Legonn解码器模块,而无需对Europarl English ASR和Romanian-English(RO-EN)(RO-EN)MT任务进行任何微调,匹配或击败基线表现。经过微调后,Legonn模型将RO-EN MT任务提高了1.5个BLEU点,并在Europarl ASR任务上相对减少了12.5%。为了显示该方法如何概括,我们从三个模块中构成了一个Legonn ASR模型 - 在三个不同的数据集的不同端到端训练的模型中都学到了每个模块 - 实现了19.5%的总体降低。
State-of-the-art encoder-decoder models (e.g. for machine translation (MT) or automatic speech recognition (ASR)) are constructed and trained end-to-end as an atomic unit. No component of the model can be (re-)used without the others, making it impossible to share parts, e.g. a high resourced decoder, across tasks. We describe LegoNN, a procedure for building encoder-decoder architectures in a way so that its parts can be applied to other tasks without the need for any fine-tuning. To achieve this reusability, the interface between encoder and decoder modules is grounded to a sequence of marginal distributions over a pre-defined discrete vocabulary. We present two approaches for ingesting these marginals; one is differentiable, allowing the flow of gradients across the entire network, and the other is gradient-isolating. To enable the portability of decoder modules between MT tasks for different source languages and across other tasks like ASR, we introduce a modality agnostic encoder which consists of a length control mechanism to dynamically adapt encoders' output lengths in order to match the expected input length range of pre-trained decoders. We present several experiments to demonstrate the effectiveness of LegoNN models: a trained language generation LegoNN decoder module from German-English (De-En) MT task can be reused without any fine-tuning for the Europarl English ASR and the Romanian-English (Ro-En) MT tasks, matching or beating the performance of baseline. After fine-tuning, LegoNN models improve the Ro-En MT task by 1.5 BLEU points and achieve 12.5% relative WER reduction on the Europarl ASR task. To show how the approach generalizes, we compose a LegoNN ASR model from three modules -- each has been learned within different end-to-end trained models on three different datasets -- achieving an overall WER reduction of 19.5%.