论文标题
具有变压器的多模式学习:一项调查
Multimodal Learning with Transformers: A Survey
论文作者
论文摘要
变形金刚是一个有前途的神经网络学习者,并且在各种机器学习任务中取得了巨大的成功。由于最近的多模式应用程序和大数据的流行率,基于变压器的多模式学习已成为AI研究中的热门话题。本文对以多模式数据为导向的变压器技术进行了全面调查。 The main contents of this survey include: (1) a background of multimodal learning, Transformer ecosystem, and the multimodal big data era, (2) a theoretical review of Vanilla Transformer, Vision Transformer, and multimodal Transformers, from a geometrically topological perspective, (3) a review of multimodal Transformer applications, via two important paradigms, i.e., for multimodal pretraining and for specific多模式任务,(4)多模式变压器模型和应用共享的共同挑战和设计的摘要,以及(5)讨论社区的开放问题和潜在的研究方向。
Transformer is a promising neural network learner, and has achieved great success in various machine learning tasks. Thanks to the recent prevalence of multimodal applications and big data, Transformer-based multimodal learning has become a hot topic in AI research. This paper presents a comprehensive survey of Transformer techniques oriented at multimodal data. The main contents of this survey include: (1) a background of multimodal learning, Transformer ecosystem, and the multimodal big data era, (2) a theoretical review of Vanilla Transformer, Vision Transformer, and multimodal Transformers, from a geometrically topological perspective, (3) a review of multimodal Transformer applications, via two important paradigms, i.e., for multimodal pretraining and for specific multimodal tasks, (4) a summary of the common challenges and designs shared by the multimodal Transformer models and applications, and (5) a discussion of open problems and potential research directions for the community.