VQ-T：使用矢量定量预测网络的RNN传感器

论文标题

VQ-T：使用矢量定量预测网络的RNN传感器

VQ-T: RNN Transducers using Vector-Quantized Prediction Network States

论文作者

Shi, Jiatong, Saon, George, Haws, David, Watanabe, Shinji, Kingsbury, Brian

论文摘要

梁搜索是端到端模型的主要ASR解码算法，它会生成树结构化的假设。但是，最近的研究表明，通过假设合并进行解码可以通过可比或更好的性能实现更有效的搜索。但是，复发网络中的完整上下文与假设合并不兼容。我们建议在RNN传感器的预测网络中使用矢量定量的长期记忆单元（VQ-LSTM）。通过与ASR网络共同培训离散表示形式，可以积极合并假设以生成晶格。我们在打电筒语料库上的实验表明，提出的VQ RNN换能器改善了具有常规预测网络的换能器的ASR性能，同时还可以针对相同的光束尺寸产生较低的甲骨文单词错误率（WER）。其他语言模型撤退实验还证明了拟议的晶格生成方案的有效性。

Beam search, which is the dominant ASR decoding algorithm for end-to-end models, generates tree-structured hypotheses. However, recent studies have shown that decoding with hypothesis merging can achieve a more efficient search with comparable or better performance. But, the full context in recurrent networks is not compatible with hypothesis merging. We propose to use vector-quantized long short-term memory units (VQ-LSTM) in the prediction network of RNN transducers. By training the discrete representation jointly with the ASR network, hypotheses can be actively merged for lattice generation. Our experiments on the Switchboard corpus show that the proposed VQ RNN transducers improve ASR performance over transducers with regular prediction networks while also producing denser lattices with a very low oracle word error rate (WER) for the same beam size. Additional language model rescoring experiments also demonstrate the effectiveness of the proposed lattice generation scheme.

下载PDF全文

下载文献需遵守相关版权规定

论文标题