论文标题
ASR的黑盒改编
Black-box Adaptation of ASR for Accented Speech
论文作者
论文摘要
我们介绍了将基于黑色的,基于云的ASR系统调整到目标口音的语音的问题。尽管领导在线ASR服务在主流口音上获得令人印象深刻的表现,但它们在子人群上的表现不佳 - 我们观察到,Google的ASR API在印度口音上达到的错误率(WER)几乎是美国口音的两倍。现有的适应方法要么需要访问模型参数,要么在输出笔录上覆盖错误校正的模块。我们强调需要将输出与原始语音相关联,以解决重音错误。因此,我们提出了一个新颖的耦合,将开源的重点调整的本地模型与黑框服务提出,其中服务指南框架级别的推断在本地模型中的输出。与现有的单词级组合策略相比,我们的细粒度合并算法在解决重音错误方面更好。具有三种领先的ASR模型作为服务的印度和澳大利亚口音的实验表明,与本地和服务模型相比,我们的相对相对相对减少了多达28%。
We introduce the problem of adapting a black-box, cloud-based ASR system to speech from a target accent. While leading online ASR services obtain impressive performance on main-stream accents, they perform poorly on sub-populations - we observed that the word error rate (WER) achieved by Google's ASR API on Indian accents is almost twice the WER on US accents. Existing adaptation methods either require access to model parameters or overlay an error-correcting module on output transcripts. We highlight the need for correlating outputs with the original speech to fix accent errors. Accordingly, we propose a novel coupling of an open-source accent-tuned local model with the black-box service where the output from the service guides frame-level inference in the local model. Our fine-grained merging algorithm is better at fixing accent errors than existing word-level combination strategies. Experiments on Indian and Australian accents with three leading ASR models as service, show that we achieve as much as 28% relative reduction in WER over both the local and service models.