论文标题
多模式机器学习的基础和趋势:原理,挑战和开放问题
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
论文作者
论文摘要
多模式机器学习是一个充满活力的多学科研究领域,旨在通过整合多种沟通方式,包括语言,声学,视觉,触觉和生理信息,设计具有智能功能的计算机代理,例如理解,推理和学习。由于最近对视频理解的兴趣,体现的自主剂,文本到图像的产生以及医疗保健和机器人技术等应用领域的多传感器融合,多模式机器学习为机器学习社区带来了独特的计算和理论挑战,因为数据源和模态之间经常发现了互连。但是,多模式研究的进展广度使得难以确定该领域的共同主题和开放问题。通过从历史和最近的角度综合了广泛的应用领域和理论框架,本文旨在概述多模式机器学习的计算和理论基础。我们首先定义了形态异质性,联系和互动的三个关键原则,这些原则促使后来的创新,并提出了六个核心技术挑战的分类学:代表性,一致性,一致性,推理,产生,转移和量化涵盖历史和最新趋势。最新的技术成就将通过该分类法的角度提出,使研究人员能够了解新方法之间的相似性和差异。最后,我们激发了我们分类法确定的未来研究的几个开放问题。
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design computer agents with intelligent capabilities such as understanding, reasoning, and learning through integrating multiple communicative modalities, including linguistic, acoustic, visual, tactile, and physiological messages. With the recent interest in video understanding, embodied autonomous agents, text-to-image generation, and multisensor fusion in application domains such as healthcare and robotics, multimodal machine learning has brought unique computational and theoretical challenges to the machine learning community given the heterogeneity of data sources and the interconnections often found between modalities. However, the breadth of progress in multimodal research has made it difficult to identify the common themes and open questions in the field. By synthesizing a broad range of application domains and theoretical frameworks from both historical and recent perspectives, this paper is designed to provide an overview of the computational and theoretical foundations of multimodal machine learning. We start by defining three key principles of modality heterogeneity, connections, and interactions that have driven subsequent innovations, and propose a taxonomy of six core technical challenges: representation, alignment, reasoning, generation, transference, and quantification covering historical and recent trends. Recent technical achievements will be presented through the lens of this taxonomy, allowing researchers to understand the similarities and differences across new approaches. We end by motivating several open problems for future research as identified by our taxonomy.