用声学特性描述情绪提示言语情感识别

论文标题

用声学特性描述情绪提示言语情感识别

Describing emotions with acoustic property prompts for speech emotion recognition

论文作者

Dhamyal, Hira, Elizalde, Benjamin, Deshmukh, Soham, Wang, Huaming, Raj, Bhiksha, Singh, Rita

论文摘要

情绪在广泛的连续体中，将情绪视为一个离散的阶级，限制了模型捕获连续性细微差别的能力。挑战是如何描述情感的细微差别以及如何使模型学习描述。在这项工作中，我们设计了一种方法，可以通过计算声音属性，例如音高，响度，语音率和发音率来自动为给定音频创建描述（或提示）。我们使用5个不同的情绪数据集将提示与其相应的音频配对。我们使用这些音频文本对训练了神经网络模型。然后，我们使用另一个数据集评估模型。我们研究了该模型如何学会将音频与描述相关联，从而导致语音情感识别和语音音频检索的性能提高。我们希望我们的发现激励研究描述广泛情感的连续性

Emotions lie on a broad continuum and treating emotions as a discrete number of classes limits the ability of a model to capture the nuances in the continuum. The challenge is how to describe the nuances of emotions and how to enable a model to learn the descriptions. In this work, we devise a method to automatically create a description (or prompt) for a given audio by computing acoustic properties, such as pitch, loudness, speech rate, and articulation rate. We pair a prompt with its corresponding audio using 5 different emotion datasets. We trained a neural network model using these audio-text pairs. Then, we evaluate the model using one more dataset. We investigate how the model can learn to associate the audio with the descriptions, resulting in performance improvement of Speech Emotion Recognition and Speech Audio Retrieval. We expect our findings to motivate research describing the broad continuum of emotion

下载PDF全文

下载文献需遵守相关版权规定

论文标题