论文标题

乐器识别的视觉关注

Visual Attention for Musical Instrument Recognition

论文作者

Watcharasupat, Karn, Gururani, Siddharth, Lerch, Alexander

论文摘要

在音乐信息检索领域,同时识别多音录音中多种乐器的存在或不存在仍然是一个困难的问题。以前的工作已经通过在多标签的多标签环境中应用时间关注来改善仪器分类方面有些成功,而另一项系列作品也表明了音调和音色在改善仪器识别性能中的作用。在这个项目中,我们进一步探讨了视觉上的暂时性意义上的注意机制的使用,以使用弱标记的数据来提高乐器识别的性能。已经探索了两种解决此任务的方法。第一种方法将注意机制应用于滑动窗口范式,在汇总之前,基于每个稍微暂时的“实例”的预测给出了注意力重量,以产生最终的预测。第二种方法是基于视觉关注的经常性模型,在该模型中,由于有限的“瞥见”,网络仅参与频谱图的一部分并决定下一步在哪里参加。

In the field of music information retrieval, the task of simultaneously identifying the presence or absence of multiple musical instruments in a polyphonic recording remains a hard problem. Previous works have seen some success in improving instrument classification by applying temporal attention in a multi-instance multi-label setting, while another series of work has also suggested the role of pitch and timbre in improving instrument recognition performance. In this project, we further explore the use of attention mechanism in a timbral-temporal sense, à la visual attention, to improve the performance of musical instrument recognition using weakly-labeled data. Two approaches to this task have been explored. The first approach applies attention mechanism to the sliding-window paradigm, where a prediction based on each timbral-temporal `instance' is given an attention weight, before aggregation to produce the final prediction. The second approach is based on a recurrent model of visual attention where the network only attends to parts of the spectrogram and decide where to attend to next, given a limited number of `glimpses'.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源