论文标题
转录就是您所需要的:学习将音乐混合物与得分分开为监督
Transcription Is All You Need: Learning to Separate Musical Mixtures with Score as Supervision
论文作者
论文摘要
大多数音乐源分离系统都需要大量的隔离来源进行培训,这可能很难获得。在这项工作中,我们使用音乐分数(相对容易获得)作为训练源分离系统的弱标签。与以前的分数分离方法相反,我们的系统不需要孤立的来源,而得分仅作为训练目标,而不是推断所需的训练目标。我们的模型由一个分离器组成,该分离器为每个仪器输出一个时间频面掩码,以及作为批评家的转录器,提供时间和频率监督以指导分离器的学习。引入了谐波遮罩约束,作为在训练过程中利用分数信息的另一种方式,我们提出了两种新型的对抗损失,以对转录器和分离器进行额外的微调。结果表明,使用分数信息的表现优于时间弱标签,对抗结构会进一步改善分离和转录性能。
Most music source separation systems require large collections of isolated sources for training, which can be difficult to obtain. In this work, we use musical scores, which are comparatively easy to obtain, as a weak label for training a source separation system. In contrast with previous score-informed separation approaches, our system does not require isolated sources, and score is used only as a training target, not required for inference. Our model consists of a separator that outputs a time-frequency mask for each instrument, and a transcriptor that acts as a critic, providing both temporal and frequency supervision to guide the learning of the separator. A harmonic mask constraint is introduced as another way of leveraging score information during training, and we propose two novel adversarial losses for additional fine-tuning of both the transcriptor and the separator. Results demonstrate that using score information outperforms temporal weak-labels, and adversarial structures lead to further improvements in both separation and transcription performance.