论文标题

旋转元素磁化图的正弦信号重建方法

A sinusoidal signal reconstruction method for the inversion of the mel-spectrogram

论文作者

Natsiou, Anastasia, O'Leary, Sean

论文摘要

通过深度学习方法的综合声音最近引起了很多关注。深度学习方法的一些问题与指定音频信号所需的数据量以及保持合成信号的长时间和短时间相干性所需的数据量有关。视觉时频表示,例如log-mel光谱图,已经获得了普及。 log-mel-spectrogram是音频的感知知情表示形式,可极大地压缩声音描述所需的信息量。但是,由于这种压缩,这种表示并非直接可逆。信号处理和机器学习技术以前都应用于log-mel-spectrogram的反转,但由于时间和频谱相干性问题,它们都引起了合成声音中的听觉失真。在本文中,我们概述了正弦模型的应用在俯仰乐器的log-mel-spectrogram倒置中,听起来胜过最先进的深度学习方法。该方法后来可以用作从光谱到神经应用中时间间隔的一般解码步骤。

The synthesis of sound via deep learning methods has recently received much attention. Some problems for deep learning approaches to sound synthesis relate to the amount of data needed to specify an audio signal and the necessity of preserving both the long and short time coherence of the synthesised signal. Visual time-frequency representations such as the log-mel-spectrogram have gained in popularity. The log-mel-spectrogram is a perceptually informed representation of audio that greatly compresses the amount of information required for the description of the sound. However, because of this compression, this representation is not directly invertible. Both signal processing and machine learning techniques have previously been applied to the inversion of the log-mel-spectrogram but they both caused audible distortions in the synthesized sounds due to issues of temporal and spectral coherence. In this paper, we outline the application of a sinusoidal model to the inversion of the log-mel-spectrogram for pitched musical instrument sounds outperforming state-of-the-art deep learning methods. The approach could be later used as a general decoding step from spectral to time intervals in neural applications.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源