论文标题
关于基于FST晶格的MMI培训的论文
A Treatise On FST Lattice Based MMI Training
论文作者
论文摘要
最大的互信息(MMI)已成为对语音识别声学模型序列级训练的两种事实方法之一。本文旨在通过基于标准有限状态传感器(FST)基于MMI的MMI培训框架的设计实施来隔离,识别和提出隐式建模决策。该论文特别研究了维持预选的分子对准的必要性,并提高了确定FST分母晶格的重要性。在数学上证明了在FST晶格确定的果蝇上使用的功效可以保证在假设水平上的歧视,并通过在18k小时的普通话数据集和2.8k小时的英语数据集中训练深入的CNN模型在经验上表现出验证。在助理和听写任务上,该方法的相对减少(WERR)在基于标准的FST晶格方法上的相对减少(WERR)之间。
Maximum mutual information (MMI) has become one of the two de facto methods for sequence-level training of speech recognition acoustic models. This paper aims to isolate, identify and bring forward the implicit modelling decisions induced by the design implementation of standard finite state transducer (FST) lattice based MMI training framework. The paper particularly investigates the necessity to maintain a preselected numerator alignment and raises the importance of determinizing FST denominator lattices on the fly. The efficacy of employing on the fly FST lattice determinization is mathematically shown to guarantee discrimination at the hypothesis level and is empirically shown through training deep CNN models on a 18K hours Mandarin dataset and on a 2.8K hours English dataset. On assistant and dictation tasks, the approach achieves between 2.3-4.6% relative WER reduction (WERR) over the standard FST lattice based approach.