由三胞胎损失引导的生成数据增强，以言语情绪识别

论文标题

由三胞胎损失引导的生成数据增强，以言语情绪识别

Generative Data Augmentation Guided by Triplet Loss for Speech Emotion Recognition

论文作者

Wang, Shijun, Hemati, Hamed, Guðnason, Jón, Borth, Damian

论文摘要

语音情绪识别（SER）对于人类计算机的互动至关重要，但由于两个主要障碍：数据稀缺和失衡仍然是一个具有挑战性的问题。许多SER的数据集基本上是不平衡的，其中一个类的数据话语（通常是中性）比其他类别的数据集更为频繁。此外，对于许多现有的口语语言，只有少数数据资源可用。为了解决这些问题，我们利用以三胞胎网络为指导的基于GAN的增强模型，以提高SER性能，并在训练数据不足和不足。我们进行实验并证明：1）使用高度不平衡的数据集，我们的增强策略显着提高了SER的性能（与基线相比，回忆得分+8％）。 2）此外，在跨语言基准中，我们在其中训练一个具有足够的源语言语言的模型，但很少有目标语言话语（在我们的实验中约有50个），我们的增强策略为所有三种目标语言的SER表现带来了好处。

Speech Emotion Recognition (SER) is crucial for human-computer interaction but still remains a challenging problem because of two major obstacles: data scarcity and imbalance. Many datasets for SER are substantially imbalanced, where data utterances of one class (most often Neutral) are much more frequent than those of other classes. Furthermore, only a few data resources are available for many existing spoken languages. To address these problems, we exploit a GAN-based augmentation model guided by a triplet network, to improve SER performance given imbalanced and insufficient training data. We conduct experiments and demonstrate: 1) With a highly imbalanced dataset, our augmentation strategy significantly improves the SER performance (+8% recall score compared with the baseline). 2) Moreover, in a cross-lingual benchmark, where we train a model with enough source language utterances but very few target language utterances (around 50 in our experiments), our augmentation strategy brings benefits for the SER performance of all three target languages.

下载PDF全文

下载文献需遵守相关版权规定

论文标题