使用复合单词表示形式生成有条件的鼓

论文标题

使用复合单词表示形式生成有条件的鼓

Conditional Drums Generation using Compound Word Representations

论文作者

Makris, Dimos, Zixun, Guo, Kaliakatsos-Papakostas, Maximos, Herremans, Dorien

论文摘要

近年来，自动音乐作品领域取得了长足的进步，特别是在基于变压器的建筑的发明方面。当使用任何将音乐视为具有多个复杂依赖性的事件序列的深度学习模型时，选择适当的数据表示形式至关重要。在本文中，我们使用新的数据编码方案来应对有条件鼓的生成任务，灵感来自复合单词表示，这是一个顺序数据的令牌化过程。因此，我们提出了序列到序列体系结构，其中双向长短期内存（BILSTM）编码器接收有关条件参数的信息（即伴随的轨道和音乐属性），而基于变压器的解码器具有相对的全局注意力产生所产生的鼓序列。我们进行了实验，以彻底将我们的方法的有效性与几个基线进行比较。定量评估表明，我们的模型能够生成具有与训练语料库相似的统计分布和特征的鼓序列。这些功能包括晕厥，压缩比和对称性。我们还通过听力测试验证了鼓声序列在“凹槽”和给定的伴奏“凹槽”时，听起来令人愉悦，自然和连贯。

The field of automatic music composition has seen great progress in recent years, specifically with the invention of transformer-based architectures. When using any deep learning model which considers music as a sequence of events with multiple complex dependencies, the selection of a proper data representation is crucial. In this paper, we tackle the task of conditional drums generation using a novel data encoding scheme inspired by the Compound Word representation, a tokenization process of sequential data. Therefore, we present a sequence-to-sequence architecture where a Bidirectional Long short-term memory (BiLSTM) Encoder receives information about the conditioning parameters (i.e., accompanying tracks and musical attributes), while a Transformer-based Decoder with relative global attention produces the generated drum sequences. We conducted experiments to thoroughly compare the effectiveness of our method to several baselines. Quantitative evaluation shows that our model is able to generate drums sequences that have similar statistical distributions and characteristics to the training corpus. These features include syncopation, compression ratio, and symmetry among others. We also verified, through a listening test, that generated drum sequences sound pleasant, natural and coherent while they "groove" with the given accompaniment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题