论文标题
OFASYS:用于构建通才模型的多模式多任务学习系统
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
论文作者
论文摘要
最近已经探索了能够在单个模型中以任务不可能的方式执行多种模式任务的通才模型。希望可以替代接近通用AI的替代方法,现有的通才模型仍处于早期阶段,即模式和任务覆盖范围有限。为了增强多模式任务尺度并加快这一研究线,我们发布了一个通才模型学习系统,OFASYS,该系统构建在名为多模式指令的声明性任务接口之上。 OFASYS的核心是将多模式任务表示从基础模型实现中解耦的想法。在OFASYS中,即使仅使用一行代码,也可以声明地定义涉及多种模式的任务。该系统会自动从此类培训和推理的说明中生成任务计划。它还促进多任务培训,用于多种模式工作负载。作为起点,我们在OFASYS中提供了7种不同模式和23个高度多样性示例任务的预设,我们还可以通过它开发一种可以处理文本,图像,语音,视频和运动数据的最初的单个模型,OFA+。单个OFA+模型平均达到95%的性能,只有15个任务范围模型的16%参数,展示了OFASYS提供的多模式任务尺度的性能可靠性。可从https://github.com/ofa-sys/ofasys获得
Generalist models, which are capable of performing diverse multi-modal tasks in a task-agnostic way within a single model, have been explored recently. Being, hopefully, an alternative to approaching general-purpose AI, existing generalist models are still at an early stage, where modality and task coverage is limited. To empower multi-modal task-scaling and speed up this line of research, we release a generalist model learning system, OFASys, built on top of a declarative task interface named multi-modal instruction. At the core of OFASys is the idea of decoupling multi-modal task representations from the underlying model implementations. In OFASys, a task involving multiple modalities can be defined declaratively even with just a single line of code. The system automatically generates task plans from such instructions for training and inference. It also facilitates multi-task training for diverse multi-modal workloads. As a starting point, we provide presets of 7 different modalities and 23 highly-diverse example tasks in OFASys, with which we also develop a first-in-kind, single model, OFA+, that can handle text, image, speech, video, and motion data. The single OFA+ model achieves 95% performance in average with only 16% parameters of 15 task-finetuned models, showcasing the performance reliability of multi-modal task-scaling provided by OFASys. Available at https://github.com/OFA-Sys/OFASys