论文标题
使用原型网络发现几个弹出关键字
Few-Shot Keyword Spotting With Prototypical Networks
论文作者
论文摘要
识别特定命令或关键字,关键字发现已被广泛用于许多语音接口,例如亚马逊的Alexa和Google Home。为了识别一组关键字,大多数最近基于深度学习的方法都使用了经过大量样本训练的神经网络来识别某些预定义的关键字。这限制了系统识别新的,用户定义的关键字。因此,我们首先将此问题提出为几个关键字斑点,并使用公制学习对其进行处理。为了启用这项研究,我们还合成并发布了一些Google语音命令数据集。然后,我们建议使用典型网络上的时间和扩张卷积来解决几个射击关键字发现问题的解决方案。我们的比较实验结果证明了仅使用少量样本对新关键字的关键字发现。
Recognizing a particular command or a keyword, keyword spotting has been widely used in many voice interfaces such as Amazon's Alexa and Google Home. In order to recognize a set of keywords, most of the recent deep learning based approaches use a neural network trained with a large number of samples to identify certain pre-defined keywords. This restricts the system from recognizing new, user-defined keywords. Therefore, we first formulate this problem as a few-shot keyword spotting and approach it using metric learning. To enable this research, we also synthesize and publish a Few-shot Google Speech Commands dataset. We then propose a solution to the few-shot keyword spotting problem using temporal and dilated convolutions on prototypical networks. Our comparative experimental results demonstrate keyword spotting of new keywords using just a small number of samples.