带有量化的LSTM网络的小英寸开放式视频磁带关键字斑点

论文标题

带有量化的LSTM网络的小英寸开放式视频磁带关键字斑点

Small-Footprint Open-Vocabulary Keyword Spotting with Quantized LSTM Networks

论文作者

Bluche, Théodore, Primet, Maël, Gisselbrecht, Thibault

论文摘要

我们探索了一个基于关键字的口语理解系统，其中可以直接从查询中的一系列关键字的序列中得出用户的意图。在本文中，我们专注于开放式唱片的关键字发现方法，允许用户定义自己的关键字，而无需重新训练整个模型。我们描述了导致快速且小型的系统的不同设计选择，能够在小型设备上运行，以便在没有特定于这些关键字的培训数据的情况下，在微小的设备上运行。该模型基于量化的长期记忆（LSTM）神经网络，接受了连接式时间分类（CTC）训练的模型，重量小于500KB。我们的方法利用了CTC训练网络预测的某些属性来校准置信分数并实现快速检测算法。所提出的系统的表现优于标准关键字填充模型方法。

We explore a keyword-based spoken language understanding system, in which the intent of the user can directly be derived from the detection of a sequence of keywords in the query. In this paper, we focus on an open-vocabulary keyword spotting method, allowing the user to define their own keywords without having to retrain the whole model. We describe the different design choices leading to a fast and small-footprint system, able to run on tiny devices, for any arbitrary set of user-defined keywords, without training data specific to those keywords. The model, based on a quantized long short-term memory (LSTM) neural network, trained with connectionist temporal classification (CTC), weighs less than 500KB. Our approach takes advantage of some properties of the predictions of CTC-trained networks to calibrate the confidence scores and implement a fast detection algorithm. The proposed system outperforms a standard keyword-filler model approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题