使用语言模型预处理钢琴乐谱图像的作曲家样式分类

论文标题

使用语言模型预处理钢琴乐谱图像的作曲家样式分类

Composer Style Classification of Piano Sheet Music Images Using Language Model Pretraining

论文作者

Tsai, TJ, Ji, Kevin

论文摘要

本文研究了钢琴乐谱图像的作曲家样式分类。作曲家分类任务的先前方法受到数据稀缺的限制。我们通过两种方式解决了这个问题：（1）我们根据原始乐谱图像而不是符号音乐格式重新提出了问题，并且（2）我们提出了一种可以在未标记的数据上培训的方法。我们的方法首先根据Bootleg特征表示，将乐谱图像转换为音乐“单词”序列，然后将序列馈入文本分类器。我们表明，可以通过先训练一组未标记的数据，使用预算上的语言模型权重初始化分类器，然后对分类器进行少量标记的数据来初始化分类器，从而显着提高分类器性能。我们在IMSLP中的所有钢琴乐谱图像上训练AWD-LSTM，GPT-2和Roberta语言模型。我们发现，基于变压器的体系结构的表现优于CNN和LSTM模型，并且在9向分类任务中，GPT-2模型的GPT-2模型的分类精度从46 \％\％\％从46 \％\％。训练有素的模型也可以用作功能提取器，该功能提取器将钢琴乐谱投射到表征构图样式的功能空间中。

This paper studies composer style classification of piano sheet music images. Previous approaches to the composer classification task have been limited by a scarcity of data. We address this issue in two ways: (1) we recast the problem to be based on raw sheet music images rather than a symbolic music format, and (2) we propose an approach that can be trained on unlabeled data. Our approach first converts the sheet music image into a sequence of musical "words" based on the bootleg feature representation, and then feeds the sequence into a text classifier. We show that it is possible to significantly improve classifier performance by first training a language model on a set of unlabeled data, initializing the classifier with the pretrained language model weights, and then finetuning the classifier on a small amount of labeled data. We train AWD-LSTM, GPT-2, and RoBERTa language models on all piano sheet music images in IMSLP. We find that transformer-based architectures outperform CNN and LSTM models, and pretraining boosts classification accuracy for the GPT-2 model from 46\% to 70\% on a 9-way classification task. The trained model can also be used as a feature extractor that projects piano sheet music into a feature space that characterizes compositional style.

下载PDF全文

下载文献需遵守相关版权规定

论文标题