多窗口数据增强方法的语音情感识别方法

论文标题

多窗口数据增强方法的语音情感识别方法

Multi-Window Data Augmentation Approach for Speech Emotion Recognition

论文作者

Padi, Sarala, Manocha, Dinesh, Sriram, Ram D.

论文摘要

我们提出了多种窗口数据扩展（MWA-SER）的语音情感识别方法。 MWA-SER是一种单峰方法，侧重于两个关键概念。设计语音增强方法并构建深度学习模型，以识别音频信号的潜在情感。我们提出的多窗口增强方法通过在音频特征提取过程中使用多个窗口大小来生成语音信号中的其他数据示例。我们表明，我们的增强方法与深度学习模型相结合，提高了语音情感识别表现。我们在三个基准数据集上评估了方法的性能：IEMOCAP，SAVEE和RAVDESS。我们表明，多窗口模型改善了SER性能并胜过单窗口模型。找到最佳窗口大小的概念是音频功能提取的重要步骤。我们进行广泛的实验评估，以找到最佳的窗口选择，并探索SER分析的窗口效果。

We present a Multi-Window Data Augmentation (MWA-SER) approach for speech emotion recognition. MWA-SER is a unimodal approach that focuses on two key concepts; designing the speech augmentation method and building the deep learning model to recognize the underlying emotion of an audio signal. Our proposed multi-window augmentation approach generates additional data samples from the speech signal by employing multiple window sizes in the audio feature extraction process. We show that our augmentation method, combined with a deep learning model, improves speech emotion recognition performance. We evaluate the performance of our approach on three benchmark datasets: IEMOCAP, SAVEE, and RAVDESS. We show that the multi-window model improves the SER performance and outperforms a single-window model. The notion of finding the best window size is an essential step in audio feature extraction. We perform extensive experimental evaluations to find the best window choice and explore the windowing effect for SER analysis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题