低等级融合的多模式序列的变压器

论文标题

低等级融合的多模式序列的变压器

Low Rank Fusion based Transformers for Multimodal Sequences

论文作者

Sahay, Saurav, Okur, Eda, Kumar, Shachi H, Nachman, Lama

论文摘要

我们的感官以协调的方式单独工作，以表达我们的情感意图。在这项工作中，我们尝试建模特定于模态的感觉信号，以了解我们的潜在多模式情感意图，反之亦然，反之亦然。多模式融合的低级别分解在模式中有助于代表近似乘法潜在信号相互作用。由〜\ cite {tsai2019mult}和〜\ cite {liu_2018}的工作激励，我们介绍了基于变压器的交叉融合体系结构，而没有任何模型过度参数化。低级别的融合有助于表示潜在信号相互作用，而特定于模式的注意力有助于关注信号的相关部分。我们介绍了CMU-MOSEI，CMU-MOSI和IEMOCAP数据集的多模式情感和情感识别结果的两种方法，并表明我们的模型具有较小的参数，更快地训练并与许多较大的基于融合的建筑相当地执行。

Our senses individually work in a coordinated fashion to express our emotional intentions. In this work, we experiment with modeling modality-specific sensory signals to attend to our latent multimodal emotional intentions and vice versa expressed via low-rank multimodal fusion and multimodal transformers. The low-rank factorization of multimodal fusion amongst the modalities helps represent approximate multiplicative latent signal interactions. Motivated by the work of~\cite{tsai2019MULT} and~\cite{Liu_2018}, we present our transformer-based cross-fusion architecture without any over-parameterization of the model. The low-rank fusion helps represent the latent signal interactions while the modality-specific attention helps focus on relevant parts of the signal. We present two methods for the Multimodal Sentiment and Emotion Recognition results on CMU-MOSEI, CMU-MOSI, and IEMOCAP datasets and show that our models have lesser parameters, train faster and perform comparably to many larger fusion-based architectures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题