用混合生成和预测模型隐藏实时数据包丢失

论文标题

用混合生成和预测模型隐藏实时数据包丢失

Real-Time Packet Loss Concealment With Mixed Generative and Predictive Model

论文作者

Valin, Jean-Marc, Mustafa, Ahmed, Montgomery, Christopher, Terriberry, Timothy B., Klingbeil, Michael, Smaragdis, Paris, Krishnaswamy, Arvindh

论文摘要

由于最近的言语增强算法最近证明了能力大大超过了他们的传统同行，以抑制噪音，混响和回声，因此注意力转向数据包丢失隐藏的问题（PLC）。 PLC是一项具有挑战性的任务，因为它不仅涉及实时语音综合，而且还涉及接收到的音频与合成的隐藏之间的频繁过渡。我们提出了一种混合神经PLC结构，其中使用使用预测模型调节的生成模型合成缺失的语音。由此产生的算法实现了自然隐藏，超过了现有的常规PLC算法的质量，并在Interspeech 2022 PLC挑战中排名第二。我们表明，我们的解决方案不仅适用于未压缩音频，而且还适用于现代语音编解码器。

As deep speech enhancement algorithms have recently demonstrated capabilities greatly surpassing their traditional counterparts for suppressing noise, reverberation and echo, attention is turning to the problem of packet loss concealment (PLC). PLC is a challenging task because it not only involves real-time speech synthesis, but also frequent transitions between the received audio and the synthesized concealment. We propose a hybrid neural PLC architecture where the missing speech is synthesized using a generative model conditioned using a predictive model. The resulting algorithm achieves natural concealment that surpasses the quality of existing conventional PLC algorithms and ranked second in the Interspeech 2022 PLC Challenge. We show that our solution not only works for uncompressed audio, but is also applicable to a modern speech codec.

下载PDF全文

下载文献需遵守相关版权规定

论文标题