JHU多微粉多演讲者ASR系统用于Chime-6挑战

论文标题

JHU多微粉多演讲者ASR系统用于Chime-6挑战

The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6 Challenge

论文作者

Arora, Ashish, Raj, Desh, Subramanian, Aswin Shanmugam, Li, Ke, Ben-Yair, Bar, Maciejewski, Matthew, Żelasko, Piotr, García, Paola, Watanabe, Shinji, Khudanpur, Sanjeev

论文摘要

本文总结了JHU团队在Chime-6挑战的第1和2赛道上的努力，以遥远的多微粉对话语音诊断和在日常家庭环境中的认可。我们在管道的每个阶段探索多阵列处理技术，例如用于增强和声学模型训练数据的多阵列指导源分离（GSS），语音活动检测后级融合，PLDA诊断的PLDA得分融合以及自动语音识别（ASR）的Lattice组合（ASR）。我们还报告了不同的声学模型体系结构的结果，并集成了其他技术，例如在线多通道加权预测误差（WPE）编织验证和基于变异的贝叶斯马尔可夫模型（VB-HMM）的重叠分配，以分别处理混响和重叠扬声器。由于这些努力，我们的ASR系统在评估集上的轨道1和2分别达到了40.5％和67.5％的单词错误率。比各个轨道的挑战基线相比，这是绝对的10.8％和10.4％的提高。

This paper summarizes the JHU team's efforts in tracks 1 and 2 of the CHiME-6 challenge for distant multi-microphone conversational speech diarization and recognition in everyday home environments. We explore multi-array processing techniques at each stage of the pipeline, such as multi-array guided source separation (GSS) for enhancement and acoustic model training data, posterior fusion for speech activity detection, PLDA score fusion for diarization, and lattice combination for automatic speech recognition (ASR). We also report results with different acoustic model architectures, and integrate other techniques such as online multi-channel weighted prediction error (WPE) dereverberation and variational Bayes-hidden Markov model (VB-HMM) based overlap assignment to deal with reverberation and overlapping speakers, respectively. As a result of these efforts, our ASR systems achieve a word error rate of 40.5% and 67.5% on tracks 1 and 2, respectively, on the evaluation set. This is an improvement of 10.8% and 10.4% absolute, over the challenge baselines for the respective tracks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题