论文标题

金字塔复发网络,用于预测现实世界信号的众包语音质量评级

A Pyramid Recurrent Network for Predicting Crowdsourced Speech-Quality Ratings of Real-World Signals

论文作者

Dong, Xuan, Williamson, Donald S.

论文摘要

客观语音质量度量的现实能力受到限制,因为目前的措施(1)是从无法充分对真实环境建模的模拟数据开发的;或者他们(2)预测并不总是与主观评分密切相关的客观分数。此外,目前不存在具有听众质量评级的大量现实信号数据集,这将有助于促进现实世界的评估。在本文中,我们收集并预测了由人类听众评估的现实世界语音信号的感知质量。我们首先通过对两个现实世界中心的众包聆听研究来收集一个较高的评级数据集。我们进一步开发了一种新型方法,该方法使用金字塔双向长期记忆(PBLSTM)网络预测人类质量评级,并具有注意机制。结果表明,所提出的模型在统计学上的估计错误比先前的评估方法较低,在这种方法中,预测的分数与人类判断密切相关。

The real-world capabilities of objective speech quality measures are limited since current measures (1) are developed from simulated data that does not adequately model real environments; or they (2) predict objective scores that are not always strongly correlated with subjective ratings. Additionally, a large dataset of real-world signals with listener quality ratings does not currently exist, which would help facilitate real-world assessment. In this paper, we collect and predict the perceptual quality of real-world speech signals that are evaluated by human listeners. We first collect a large quality rating dataset by conducting crowdsourced listening studies on two real-world corpora. We further develop a novel approach that predicts human quality ratings using a pyramid bidirectional long short term memory (pBLSTM) network with an attention mechanism. The results show that the proposed model achieves statistically lower estimation errors than prior assessment approaches, where the predicted scores strongly correlate with human judgments.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源