用$ \ texttt {t5x} $和$ \ texttt {seqio} $扩展模型和数据

论文标题

用$ \ texttt {t5x} $和$ \ texttt {seqio} $扩展模型和数据

Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$

论文作者

Roberts, Adam, Chung, Hyung Won, Levskaya, Anselm, Mishra, Gaurav, Bradbury, James, Andor, Daniel, Narang, Sharan, Lester, Brian, Gaffney, Colin, Mohiuddin, Afroz, Hawthorne, Curtis, Lewkowycz, Aitor, Salcianu, Alex, van Zee, Marc, Austin, Jacob, Goodman, Sebastian, Soares, Livio Baldini, Hu, Haitang, Tsvyashchenko, Sasha, Chowdhery, Aakanksha, Bastings, Jasmijn, Bulian, Jannis, Garcia, Xavier, Ni, Jianmo, Chen, Andrew, Kenealy, Kathleen, Clark, Jonathan H., Lee, Stephan, Garrette, Dan, Lee-Thorp, James, Raffel, Colin, Shazeer, Noam, Ritter, Marvin, Bosma, Maarten, Passos, Alexandre, Maitin-Shepard, Jeremy, Fiedel, Noah, Omernick, Mark, Saeta, Brennan, Sepassi, Ryan, Spiridonov, Alexander, Newlan, Joshua, Gesmundo, Andrea

论文摘要

最新的基于神经网络的语言模型从扩大培训数据集的规模以及模型本身中的参数数量中受益匪浅。由于各种因素，包括需要在超级计算机簇（例如TPU）上分发计算，可以防止摄入数据时的瓶颈并确保可重现结果，因此缩放可能很复杂。在这项工作中，我们提出了两个软件库，可以简化这些问题：$ \ texttt {t5x} $简化了按大规模构建和培训大语言模型的过程，同时保持易用性，$ \ texttt {seqio} $提供了基于任务的API，可简单地创建快速和可重复的培训数据和评估数据和评估管道。这些开源库已用于培训具有数百亿个参数的模型，并具有多个培训数据的数据集。除库外，我们还发布了类似T5的编码器模型以及类似GPT的纯解码器架构的配置和说明。 $ \ texttt {t5x} $和$ \ texttt {seqio} $是开源的，可在https://github.com/google-research/t5x和https：//github.com/google.com/google/seqio中找到。

Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves. Scaling can be complicated due to various factors including the need to distribute computation on supercomputer clusters (e.g., TPUs), prevent bottlenecks when infeeding data, and ensure reproducible results. In this work, we present two software libraries that ease these issues: $\texttt{t5x}$ simplifies the process of building and training large language models at scale while maintaining ease of use, and $\texttt{seqio}$ provides a task-based API for simple creation of fast and reproducible training data and evaluation pipelines. These open-source libraries have been used to train models with hundreds of billions of parameters on datasets with multiple terabytes of training data. Along with the libraries, we release configurations and instructions for T5-like encoder-decoder models as well as GPT-like decoder-only architectures. $\texttt{t5x}$ and $\texttt{seqio}$ are open source and available at https://github.com/google-research/t5x and https://github.com/google/seqio, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题