论文标题
使用Bootleg Score Connthesis的Midi-sheet音乐对齐
MIDI-Sheet Music Alignment Using Bootleg Score Synthesis
论文作者
论文摘要
Midi-sheet音乐对齐是在曲目的MIDI表示及其相应的乐谱图像之间找到对应关系的任务。我们没有使用光学音乐识别来弥合乐谱和MIDI之间的差距,而是探索一种替代方法:将MIDI数据投射到像素空间中并在图像域中进行对齐。我们的方法将MIDI数据转换为仅包含矩形浮力Notehead Blobs的分数的粗略表示,我们称之为Bootleg分数合成。此外,我们通过应用深层的注销头探测器并填充每个检测到的Notehead周围的边界框,将音乐图像投射到同一盗版空间中。最后,我们使用简单的动态时间扭曲变体对齐盗版表示。在来自IMSLP和相应MIDI表现的68个真实扫描的钢琴得分的数据集中,我们的方法以一秒钟的误差能力达到了97.3%的精度,表现优于几个采用光学音乐识别的基线系统。
MIDI-sheet music alignment is the task of finding correspondences between a MIDI representation of a piece and its corresponding sheet music images. Rather than using optical music recognition to bridge the gap between sheet music and MIDI, we explore an alternative approach: projecting the MIDI data into pixel space and performing alignment in the image domain. Our method converts the MIDI data into a crude representation of the score that only contains rectangular floating notehead blobs, a process we call bootleg score synthesis. Furthermore, we project sheet music images into the same bootleg space by applying a deep watershed notehead detector and filling in the bounding boxes around each detected notehead. Finally, we align the bootleg representations using a simple variant of dynamic time warping. On a dataset of 68 real scanned piano scores from IMSLP and corresponding MIDI performances, our method achieves a 97.3% accuracy at an error tolerance of one second, outperforming several baseline systems that employ optical music recognition.