音频降级用于强大的音频指纹打印

论文标题

音频降级用于强大的音频指纹打印

Audio Denoising for Robust Audio Fingerprinting

论文作者

Akesbi, Kamil

论文摘要

音乐发现服务使用户可以从短移动录音中识别歌曲。这些解决方案通常基于音频指纹识别，并且更具体地依赖于光谱峰的提取，以便对许多扭曲进行稳健。几乎没有做过研究这些算法对在真实环境中捕获的背景噪声的鲁棒性的工作。特别是，当信号与噪声比较低时，即背景噪声强烈时，AFP系统仍然在困难。在这个项目中，我们通过深度学习解决了这个问题。我们测试了一种新的混合策略，该策略包括在基于峰值的AFP算法前插入Denoising DL模型。我们使用现实的数据增强管道模拟嘈杂的音乐录制，并训练DL模型来降低它们。该模型限制了背景噪声对AFP系统提取的峰的影响，从而提高了其对噪声的稳健性。我们进一步提出了一种新颖的损耗函数，以使DL模型适应所考虑的AFP系统，从而在检索到的光谱峰方面提高了其精度。据我们所知，这种混合策略以前尚未进行测试。

Music discovery services let users identify songs from short mobile recordings. These solutions are often based on Audio Fingerprinting, and rely more specifically on the extraction of spectral peaks in order to be robust to a number of distortions. Few works have been done to study the robustness of these algorithms to background noise captured in real environments. In particular, AFP systems still struggle when the signal to noise ratio is low, i.e when the background noise is strong. In this project, we tackle this problematic with Deep Learning. We test a new hybrid strategy which consists of inserting a denoising DL model in front of a peak-based AFP algorithm. We simulate noisy music recordings using a realistic data augmentation pipeline, and train a DL model to denoise them. The denoising model limits the impact of background noise on the AFP system's extracted peaks, improving its robustness to noise. We further propose a novel loss function to adapt the DL model to the considered AFP system, increasing its precision in terms of retrieved spectral peaks. To the best of our knowledge, this hybrid strategy has not been tested before.

下载PDF全文

下载文献需遵守相关版权规定

论文标题