论文标题
质量的客观衡量音频的时间尺度修改
An Objective Measure of Quality for Time-Scale Modification of Audio
论文作者
论文摘要
通过时间尺度修改(TSM)处理的音频的客观评估仍然是一个空旷的问题。最近,发布了带有主观质量标签的时标音频数据集,并用于创建质量的初始客观度量。在本文中,提出了改进的时间尺度音频质量的客观度量。该措施使用手工制作的功能和完全连接的网络来预测主观的平均意见分数。除了TSM伪像的九种功能外,还使用了音频质量特征的基本和高级感知评估。探索了六种对齐方式,并将参考幅度光谱插值与测试幅度光谱的长度可提供最佳性能。所提出的度量达到的平均均方根误差为0.487和平均Pearson相关性为0.865,分别为主观会话的第98%和82%。提出的措施用于评估时间尺度修改算法,发现Elastique可以为独奏仪器和语音信号提供最高的客观质量,而身份循环相位锁定相位vosoder为音乐信号和最佳整体质量提供了最高的客观质量。客观度量可在https://www.github.com/zygurt/tsm上获得。
Objective evaluation of audio processed with Time-Scale Modification (TSM) remains an open problem. Recently, a dataset of time-scaled audio with subjective quality labels was published and used to create an initial objective measure of quality. In this paper, an improved objective measure of quality for time-scaled audio is proposed. The measure uses hand-crafted features and a fully connected network to predict subjective mean opinion scores. Basic and Advanced Perceptual Evaluation of Audio Quality features are used in addition to nine features specific to TSM artefacts. Six methods of alignment are explored, with interpolation of the reference magnitude spectrum to the length of the test magnitude spectrum giving the best performance. The proposed measure achieves a mean Root Mean Squared Error of 0.487 and a mean Pearson correlation of 0.865, equivalent to 98th and 82nd percentiles of subjective sessions respectively. The proposed measure is used to evaluate time-scale modification algorithms, finding that Elastique gives the highest objective quality for Solo instrument and voice signals, while the Identity Phase-Locking Phase Vocoder gives the highest objective quality for music signals and the best overall quality. The objective measure is available at https://www.github.com/zygurt/TSM.