使用多任务学习以识别多任务的多模式嵌入

论文标题

使用多任务学习以识别多任务的多模式嵌入

Multi-modal embeddings using multi-task learning for emotion recognition

论文作者

Khare, Aparna, Parthasarathy, Srinivas, Sundaram, Shiva

论文摘要

像Word2Vec，Glove和Elmo这样的一般嵌入式在自然语言任务中取得了很大的成功。嵌入通常是从基于Skip-Cram模型和自然语言生成等一般任务的模型中提取的。在本文中，我们将工作从自然语言的理解扩展到使用音频，视觉和文本信息进行机器学习任务的多模式体系结构。使用多任务训练训练的变压器模型的编码器提取我们网络中的嵌入。我们使用人员识别和自动语音识别作为嵌入生成框架中的任务。我们对情绪识别的下游任务进行调整和评估嵌入，并证明在CMU-Mosei数据集中，可以使用嵌入来改善先前的最新结果状态。

General embeddings like word2vec, GloVe and ELMo have shown a lot of success in natural language tasks. The embeddings are typically extracted from models that are built on general tasks such as skip-gram models and natural language generation. In this paper, we extend the work from natural language understanding to multi-modal architectures that use audio, visual and textual information for machine learning tasks. The embeddings in our network are extracted using the encoder of a transformer model trained using multi-task training. We use person identification and automatic speech recognition as the tasks in our embedding generation framework. We tune and evaluate the embeddings on the downstream task of emotion recognition and demonstrate that on the CMU-MOSEI dataset, the embeddings can be used to improve over previous state of the art results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题