论文标题

变异语音波形压缩以催化语义通信

Variational Speech Waveform Compression to Catalyze Semantic Communications

论文作者

Yao, Shengshi, Xiao, Zixuan, Wang, Sixian, Dai, Jincheng, Niu, Kai, Zhang, Ping

论文摘要

我们提出了一种新型的神经波形压缩方法来催化新兴的语音语义通信。通过引入非线性变换和变异建模,我们有效地捕获了语音框架内的依赖项,并更准确地估算了语音特征的概率分布,从而提高了更好的压缩性能。特别是,通过一对非线性变换分析和合成语音信号,从而产生潜在特征。具有高位的熵模型旨在捕获潜在特征的概率分布,然后进行量化和熵编码。提出的波形编解码器可以灵活地针对任意率进行优化,另一个吸引人的功能是,它可以轻松地针对任何可区分的损失函数进行优化,包括语义通信中使用的感知损失。为了进一步提高保真度,我们合并了剩余编码,以减轻潜在空间量化失真引起的降解。结果表明,与广泛使用的自适应多率宽带(AMR-WB)编解码器以及新兴的神经波形编码方法相比,提出的方法达到相同的性能可节省高达27%的编码率。

We propose a novel neural waveform compression method to catalyze emerging speech semantic communications. By introducing nonlinear transform and variational modeling, we effectively capture the dependencies within speech frames and estimate the probabilistic distribution of the speech feature more accurately, giving rise to better compression performance. In particular, the speech signals are analyzed and synthesized by a pair of nonlinear transforms, yielding latent features. An entropy model with hyperprior is built to capture the probabilistic distribution of latent features, followed with quantization and entropy coding. The proposed waveform codec can be optimized flexibly towards arbitrary rate, and the other appealing feature is that it can be easily optimized for any differentiable loss function, including perceptual loss used in semantic communications. To further improve the fidelity, we incorporate residual coding to mitigate the degradation arising from quantization distortion at the latent space. Results indicate that achieving the same performance, the proposed method saves up to 27% coding rate than widely used adaptive multi-rate wideband (AMR-WB) codec as well as emerging neural waveform coding methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源