语音窗口：流媒体的语音分离，以识别设备的语音识别

论文标题

语音窗口：流媒体的语音分离，以识别设备的语音识别

VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition

论文作者

Wang, Quan, Moreno, Ignacio Lopez, Saglam, Mert, Wilson, Kevin, Chiao, Alan, Liu, Renjie, He, Yanzhang, Li, Wei, Pelecanos, Jason, Nika, Marily, Gruenstein, Alexander

论文摘要

我们介绍了语VoiceFilter-Lite，这是一种单渠道源分离模型，该模型在设备上运行以仅保留目标用户的语音信号，这是流语音识别系统的一部分。提供这样的模型提出了许多挑战：当输入信号由重叠的语音组成时，它应该提高性能，并且在所有其他声学条件下都不得损害语音识别性能。此外，该模型必须很小，快速，并且以流方式进行推断，以便对CPU，内存，电池和延迟产生最小的影响。我们提出了新的技术来满足这些多方面的要求，包括使用新的不对称损失以及采用自适应运行时抑制强度。我们还表明，这样的模型可以用作8位整数模型并实时运行。

We introduce VoiceFilter-Lite, a single-channel source separation model that runs on the device to preserve only the speech signals from a target user, as part of a streaming speech recognition system. Delivering such a model presents numerous challenges: It should improve the performance when the input signal consists of overlapped speech, and must not hurt the speech recognition performance under all other acoustic conditions. Besides, this model must be tiny, fast, and perform inference in a streaming fashion, in order to have minimal impact on CPU, memory, battery and latency. We propose novel techniques to meet these multi-faceted requirements, including using a new asymmetric loss, and adopting adaptive runtime suppression strength. We also show that such a model can be quantized as a 8-bit integer model and run in realtime.

下载PDF全文

下载文献需遵守相关版权规定

论文标题