论文标题

迈向低渗透性多通道语音增强:ESPNET-SE提交L3DAS22挑战

Towards Low-distortion Multi-channel Speech Enhancement: The ESPNet-SE Submission to The L3DAS22 Challenge

论文作者

Lu, Yen-Ju, Cornell, Samuele, Chang, Xuankai, Zhang, Wangyou, Li, Chenda, Ni, Zhaoheng, Wang, Zhong-Qiu, Watanabe, Shinji

论文摘要

本文介绍了我们对L3DAS22挑战任务1的提交,该任务由3D Ambisonic麦克风的语音增强组成。我们方法的核心结合了深度神经网络(DNN)驱动的复杂频谱映射与线性光束形成器,例如多帧多通道Wiener滤波器。我们提出的系统中有两个DNN和一个线性光束形成器。两种DNN均经过训练,可以使用波形和幅度损失的组合来执行复杂的光谱映射。第一个DNN的估计信号用于驱动线性光束形式,并且光束成型的结果以及该增强的信号被用作第二个DNN的额外输入,以完善估计。然后,从这个新的估计信号中,线性光束形成器和第二个DNN迭代运行。所提出的方法在挑战中排名第一,在评估集中达到0.984的排名指标,而挑战基线的0.833。

This paper describes our submission to the L3DAS22 Challenge Task 1, which consists of speech enhancement with 3D Ambisonic microphones. The core of our approach combines Deep Neural Network (DNN) driven complex spectral mapping with linear beamformers such as the multi-frame multi-channel Wiener filter. Our proposed system has two DNNs and a linear beamformer in between. Both DNNs are trained to perform complex spectral mapping, using a combination of waveform and magnitude spectrum losses. The estimated signal from the first DNN is used to drive a linear beamformer, and the beamforming result, together with this enhanced signal, are used as extra inputs for the second DNN which refines the estimation. Then, from this new estimated signal, the linear beamformer and second DNN are run iteratively. The proposed method was ranked first in the challenge, achieving, on the evaluation set, a ranking metric of 0.984, versus 0.833 of the challenge baseline.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源