一种用于符号音乐情感识别的新颖的多任务学习方法

论文标题

一种用于符号音乐情感识别的新颖的多任务学习方法

A Novel Multi-Task Learning Method for Symbolic Music Emotion Recognition

论文作者

Qiu, Jibao, Chen, C. L. Philip, Zhang, Tong

论文摘要

符号音乐情感识别（SMER）是从MIDI和MUSICXML等符号数据中预测音乐情感。先前的工作主要集中在通过（蒙版）语言模型预训练的更好表示，但忽略了音乐的内在结构，这对于音乐的情感表达非常重要。在本文中，我们为SMER提供了一个简单的多任务框架，该框架将情感识别任务与其他与情感有关的辅助任务纳入了音乐的内在结构。结果表明，我们的多任务框架可以适用于不同的模型。此外，易于获得辅助任务的标签，这意味着我们的多任务方法不需要手动注释的标签。在两个公开可用的数据集（EMOPIA和VGMIDI）上进行的实验表明，我们的方法在SMER任务中的表现更好。具体而言，EMOPIA数据集中的准确性已提高到4.17的绝对点为67.58，而VGMIDI数据集中的精度已提高到1.97的绝对点为55.85。消融研究还显示了本文设计的多任务方法的有效性。

Symbolic Music Emotion Recognition(SMER) is to predict music emotion from symbolic data, such as MIDI and MusicXML. Previous work mainly focused on learning better representation via (mask) language model pre-training but ignored the intrinsic structure of the music, which is extremely important to the emotional expression of music. In this paper, we present a simple multi-task framework for SMER, which incorporates the emotion recognition task with other emotion-related auxiliary tasks derived from the intrinsic structure of the music. The results show that our multi-task framework can be adapted to different models. Moreover, the labels of auxiliary tasks are easy to be obtained, which means our multi-task methods do not require manually annotated labels other than emotion. Conducting on two publicly available datasets (EMOPIA and VGMIDI), the experiments show that our methods perform better in SMER task. Specifically, accuracy has been increased by 4.17 absolute point to 67.58 in EMOPIA dataset, and 1.97 absolute point to 55.85 in VGMIDI dataset. Ablation studies also show the effectiveness of multi-task methods designed in this paper.

下载PDF全文

下载文献需遵守相关版权规定

论文标题