情感感知的自动语音识别预训练，以增强语音情感识别

论文标题

情感感知的自动语音识别预训练，以增强语音情感识别

Sentiment-Aware Automatic Speech Recognition pre-training for enhanced Speech Emotion Recognition

论文作者

Ghriss, Ayoub, Yang, Bo, Rozgic, Viktor, Shriberg, Elizabeth, Wang, Chao

论文摘要

我们提出了一种新颖的多任务训练方法，用于语音情绪识别（SER）。我们同时根据自动语音识别（ASR）和情感分类任务同时预训练SER模型，以使声学ASR模型更加``情感意识''。我们使用对公开数据训练的文本到验证模型来生成情感分类的目标。最后，我们在注释的语音数据上微调了声学ASR。我们在MSP播客数据集上评估了所提出的方法，在该数据集中，我们获得了价值预测的最佳报告的一致性相关系数（CCC）为0.41。

We propose a novel multi-task pre-training method for Speech Emotion Recognition (SER). We pre-train SER model simultaneously on Automatic Speech Recognition (ASR) and sentiment classification tasks to make the acoustic ASR model more ``emotion aware''. We generate targets for the sentiment classification using text-to-sentiment model trained on publicly available data. Finally, we fine-tune the acoustic ASR on emotion annotated speech data. We evaluated the proposed approach on the MSP-Podcast dataset, where we achieved the best reported concordance correlation coefficient (CCC) of 0.41 for valence prediction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题