ir-gan：远场语音识别的房间冲动响应生成器

论文标题

ir-gan：远场语音识别的房间冲动响应生成器

IR-GAN: Room Impulse Response Generator for Far-field Speech Recognition

论文作者

Ratnarajah, Anton, Tang, Zhenyu, Manocha, Dinesh

论文摘要

我们提出了一个基于生成的对抗网络（GAN）的室内冲动响应生成器（IR-GAN），用于生成逼真的合成房间脉冲响应（RIRS）。 IR-GAN从捕获的现实世界中提取声学参数，并使用这些参数生成新的合成RIR。我们使用这些生成的合成RIR来改善与培训数据集不同环境中的远场自动语音识别。特别是，我们通过使用干净的LibrisPeech数据集进行综合RIR来增强远场演讲训练。我们评估了使用来自buybdb和Air数据集的现实世界RIR创建的现实世界的Librispeech测试集的合成RIR的质量。我们的IR-GAN在远场语音识别基准中报告的错误率比几何声学模拟器（气体）低8.95％。当我们将合成RIR与使用气体产生的合成脉冲反应相结合时，我们进一步提高了性能。在远场语音识别基准中，这种组合可以将单词错误率降低14.3％。

We present a Generative Adversarial Network (GAN) based room impulse response generator (IR-GAN) for generating realistic synthetic room impulse responses (RIRs). IR-GAN extracts acoustic parameters from captured real-world RIRs and uses these parameters to generate new synthetic RIRs. We use these generated synthetic RIRs to improve far-field automatic speech recognition in new environments that are different from the ones used in training datasets. In particular, we augment the far-field speech training set by convolving our synthesized RIRs with a clean LibriSpeech dataset. We evaluate the quality of our synthetic RIRs on the real-world LibriSpeech test set created using real-world RIRs from the BUT ReverbDB and AIR datasets. Our IR-GAN reports up to an 8.95% lower error rate than Geometric Acoustic Simulator (GAS) in far-field speech recognition benchmarks. We further improve the performance when we combine our synthetic RIRs with synthetic impulse responses generated using GAS. This combination can reduce the word error rate by up to 14.3% in far-field speech recognition benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题