对于多模式情绪识别而言，跨注意是比自我注意力更可取吗？

论文标题

对于多模式情绪识别而言，跨注意是比自我注意力更可取吗？

Is Cross-Attention Preferable to Self-Attention for Multi-Modal Emotion Recognition?

论文作者

Rajan, Vandana, Brutti, Alessio, Cavallaro, Andrea

论文摘要

人类通过面部表情，语音语调和单词选择来表达自己的情绪。为了推断潜在情绪的性质，识别模型可以使用单一的模态，例如视觉，音频和文本或模态的组合。通常，融合来自多种模式的互补信息的模型优于他们的单模式。但是，融合模式的成功模型需要可以有效地从每种模式中汇总与任务相关的信息的组件。由于跨模式的注意力被视为多模式融合的有效机制，因此在本文中，我们量化了这种机制带来的增益与相应的自我注意机制相比。为此，我们实施并比较了跨注意事项和自我注意力模型。除了注意之外，每个模型还使用卷积层进行局部特征提取和重复的层进行全局顺序建模。我们使用不同的模态组合对模型进行比较，用于使用IEMOCAP数据集进行7级情感分类任务。实验结果表明，尽管这两个模型在三模式和双模式配置的加权和未加权准确性方面都可以改善最先进的方法，但它们的性能通常在统计上是可比的。复制实验的代码可在https://github.com/smartcameras/selfcrossattn上找到

Humans express their emotions via facial expressions, voice intonation and word choices. To infer the nature of the underlying emotion, recognition models may use a single modality, such as vision, audio, and text, or a combination of modalities. Generally, models that fuse complementary information from multiple modalities outperform their uni-modal counterparts. However, a successful model that fuses modalities requires components that can effectively aggregate task-relevant information from each modality. As cross-modal attention is seen as an effective mechanism for multi-modal fusion, in this paper we quantify the gain that such a mechanism brings compared to the corresponding self-attention mechanism. To this end, we implement and compare a cross-attention and a self-attention model. In addition to attention, each model uses convolutional layers for local feature extraction and recurrent layers for global sequential modelling. We compare the models using different modality combinations for a 7-class emotion classification task using the IEMOCAP dataset. Experimental results indicate that albeit both models improve upon the state-of-the-art in terms of weighted and unweighted accuracy for tri- and bi-modal configurations, their performance is generally statistically comparable. The code to replicate the experiments is available at https://github.com/smartcameras/SelfCrossAttn

下载PDF全文

下载文献需遵守相关版权规定

论文标题