论文标题

野外自动音乐转录的不一致监督

Unaligned Supervision For Automatic Music Transcription in The Wild

论文作者

Maman, Ben, Bermano, Amit H.

论文摘要

多功能仪器自动音乐转录(AMT)或音乐录制为语义音乐内容的解码是音乐信息检索的圣杯之一。由于数据收集困难,当前的AMT方法仅限于钢琴和(某些)吉他录音。为了克服数据收集障碍,以前的AMT方法试图以同一歌曲或作品的数字化版本的形式采用音乐分数。该分数通常使用音频功能和剧烈的人干预来对齐,以产生训练标签。我们介绍了NoteEM,这是一种在完全自动化的过程中同时训练转夹并将得分与相应性能对齐的方法。使用这种未对准的监督计划,并附有伪标签和推销速度增强,我们的方法可以以前所未有的准确性和工具性培训对野外记录进行培训。仅使用合成数据和非对齐的监督,我们报告了地图数据集的SOTA注释级准确性,以及在跨数据库评估上的较大有利边缘。我们还表现出鲁棒性和易用性;当在一个小型,易于获得,自我收集的数据集上进行培训时,我们报告了可比的结果,并且我们向MusicNet数据集提出了替代标签,我们表明这更准确。我们的项目页面可从https://benadar293.github.io获得。

Multi-instrument Automatic Music Transcription (AMT), or the decoding of a musical recording into semantic musical content, is one of the holy grails of Music Information Retrieval. Current AMT approaches are restricted to piano and (some) guitar recordings, due to difficult data collection. In order to overcome data collection barriers, previous AMT approaches attempt to employ musical scores in the form of a digitized version of the same song or piece. The scores are typically aligned using audio features and strenuous human intervention to generate training labels. We introduce NoteEM, a method for simultaneously training a transcriber and aligning the scores to their corresponding performances, in a fully-automated process. Using this unaligned supervision scheme, complemented by pseudo-labels and pitch-shift augmentation, our method enables training on in-the-wild recordings with unprecedented accuracy and instrumental variety. Using only synthetic data and unaligned supervision, we report SOTA note-level accuracy of the MAPS dataset, and large favorable margins on cross-dataset evaluations. We also demonstrate robustness and ease of use; we report comparable results when training on a small, easily obtainable, self-collected dataset, and we propose alternative labeling to the MusicNet dataset, which we show to be more accurate. Our project page is available at https://benadar293.github.io

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源