论文标题
情感感知的自动语音识别预训练,以增强语音情感识别
Sentiment-Aware Automatic Speech Recognition pre-training for enhanced Speech Emotion Recognition
论文作者
论文摘要
我们提出了一种新颖的多任务训练方法,用于语音情绪识别(SER)。我们同时根据自动语音识别(ASR)和情感分类任务同时预训练SER模型,以使声学ASR模型更加``情感意识''。我们使用对公开数据训练的文本到验证模型来生成情感分类的目标。最后,我们在注释的语音数据上微调了声学ASR。我们在MSP播客数据集上评估了所提出的方法,在该数据集中,我们获得了价值预测的最佳报告的一致性相关系数(CCC)为0.41。
We propose a novel multi-task pre-training method for Speech Emotion Recognition (SER). We pre-train SER model simultaneously on Automatic Speech Recognition (ASR) and sentiment classification tasks to make the acoustic ASR model more ``emotion aware''. We generate targets for the sentiment classification using text-to-sentiment model trained on publicly available data. Finally, we fine-tune the acoustic ASR on emotion annotated speech data. We evaluated the proposed approach on the MSP-Podcast dataset, where we achieved the best reported concordance correlation coefficient (CCC) of 0.41 for valence prediction.