论文标题
使用语言模型预处理钢琴乐谱图像的作曲家样式分类
Composer Style Classification of Piano Sheet Music Images Using Language Model Pretraining
论文作者
论文摘要
本文研究了钢琴乐谱图像的作曲家样式分类。作曲家分类任务的先前方法受到数据稀缺的限制。我们通过两种方式解决了这个问题:(1)我们根据原始乐谱图像而不是符号音乐格式重新提出了问题,并且(2)我们提出了一种可以在未标记的数据上培训的方法。我们的方法首先根据Bootleg特征表示,将乐谱图像转换为音乐“单词”序列,然后将序列馈入文本分类器。我们表明,可以通过先训练一组未标记的数据,使用预算上的语言模型权重初始化分类器,然后对分类器进行少量标记的数据来初始化分类器,从而显着提高分类器性能。我们在IMSLP中的所有钢琴乐谱图像上训练AWD-LSTM,GPT-2和Roberta语言模型。我们发现,基于变压器的体系结构的表现优于CNN和LSTM模型,并且在9向分类任务中,GPT-2模型的GPT-2模型的分类精度从46 \%\%\%从46 \%\%。训练有素的模型也可以用作功能提取器,该功能提取器将钢琴乐谱投射到表征构图样式的功能空间中。
This paper studies composer style classification of piano sheet music images. Previous approaches to the composer classification task have been limited by a scarcity of data. We address this issue in two ways: (1) we recast the problem to be based on raw sheet music images rather than a symbolic music format, and (2) we propose an approach that can be trained on unlabeled data. Our approach first converts the sheet music image into a sequence of musical "words" based on the bootleg feature representation, and then feeds the sequence into a text classifier. We show that it is possible to significantly improve classifier performance by first training a language model on a set of unlabeled data, initializing the classifier with the pretrained language model weights, and then finetuning the classifier on a small amount of labeled data. We train AWD-LSTM, GPT-2, and RoBERTa language models on all piano sheet music images in IMSLP. We find that transformer-based architectures outperform CNN and LSTM models, and pretraining boosts classification accuracy for the GPT-2 model from 46\% to 70\% on a 9-way classification task. The trained model can also be used as a feature extractor that projects piano sheet music into a feature space that characterizes compositional style.