探索参数化神经音频效应的质量和普遍性

论文标题

探索参数化神经音频效应的质量和普遍性

Exploring Quality and Generalizability in Parameterized Neural Audio Effects

论文作者

Mitchell, William, Hawley, Scott H.

论文摘要

深度神经网络已经显示出对音乐音频信号处理应用程序的希望，通常超过先前的方法，尤其是作为波形域中的端到端模型。然而，迄今为止的结果往往受到低样本速率，噪声，信号类型的狭窄域和/或缺乏参数化控件（即“旋钮”）的限制，这使其对专业音频工程工作流程的适合性仍然缺乏。这项工作扩展了有关通过深层神经网络与音乐生产相关的非线性时间相关信号处理效果进行建模的先前研究，其中包括模仿您在模拟设备上看到的参数化设置的能力，目的是最终在商业上可行，高质量的音频生产，即44.1 khz sampling率以16 bit确定。本文的结果突出了通过体系结构和优化变化建模这些效果的进展，朝着提高计算效率，降低信号噪声比，并扩展到更大的非线性音频效应。在这些目的方面，采用的策略涉及一种三管齐下的方法：模型速度，模型准确性和模型通用性。除了数据集操纵外，大多数提出的方法比原始模型提供了边际或没有提高的输出精度。我们发现，限制数据集的音频内容，例如，仅使用单个仪器的数据集提供了比在更通用数据集中训练的模型的模型准确性的显着提高。

Deep neural networks have shown promise for music audio signal processing applications, often surpassing prior approaches, particularly as end-to-end models in the waveform domain. Yet results to date have tended to be constrained by low sample rates, noise, narrow domains of signal types, and/or lack of parameterized controls (i.e. "knobs"), making their suitability for professional audio engineering workflows still lacking. This work expands on prior research published on modeling nonlinear time-dependent signal processing effects associated with music production by means of a deep neural network, one which includes the ability to emulate the parameterized settings you would see on an analog piece of equipment, with the goal of eventually producing commercially viable, high quality audio, i.e. 44.1 kHz sampling rate at 16-bit resolution. The results in this paper highlight progress in modeling these effects through architecture and optimization changes, towards increasing computational efficiency, lowering signal-to-noise ratio, and extending to a larger variety of nonlinear audio effects. Toward these ends, the strategies employed involved a three-pronged approach: model speed, model accuracy, and model generalizability. Most of the presented methods provide marginal or no increase in output accuracy over the original model, with the exception of dataset manipulation. We found that limiting the audio content of the dataset, for example using datasets of just a single instrument, provided a significant improvement in model accuracy over models trained on more general datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题