论文标题
通过联合音调检测改善歌词对齐
Improving Lyrics Alignment through Joint Pitch Detection
论文作者
论文摘要
近年来,自动歌词对准方法的准确性已大大提高。然而,许多当前的方法采用了为自动语音识别(ASR)设计的框架,并且不利用音乐特定的属性。音调是唱歌声音的一个重要音乐属性,但是当前系统通常会忽略它,因为歌词内容被视为独立于音调。但是,在实践中,两者开始时有一个时间相关性,因为音符开始通常与音素启动相关。同时,通常在地面真相数据中以高时间精度注释音调,而歌词的时机通常仅在线(或单词)级别上可用。在本文中,我们提出了一种多任务学习方法,以进行歌词对齐方式,以结合音调,从而可以利用新的高度准确的时间信息来源。我们的结果表明,我们的方法确实提高了对齐结果的准确性。作为另一个贡献,我们表明,在强制对准算法中整合边界检测会减少跨线误差,从而进一步提高了准确性。
In recent years, the accuracy of automatic lyrics alignment methods has increased considerably. Yet, many current approaches employ frameworks designed for automatic speech recognition (ASR) and do not exploit properties specific to music. Pitch is one important musical attribute of singing voice but it is often ignored by current systems as the lyrics content is considered independent of the pitch. In practice, however, there is a temporal correlation between the two as note starts often correlate with phoneme starts. At the same time the pitch is usually annotated with high temporal accuracy in ground truth data while the timing of lyrics is often only available at the line (or word) level. In this paper, we propose a multi-task learning approach for lyrics alignment that incorporates pitch and thus can make use of a new source of highly accurate temporal information. Our results show that the accuracy of the alignment result is indeed improved by our approach. As an additional contribution, we show that integrating boundary detection in the forced-alignment algorithm reduces cross-line errors, which improves the accuracy even further.