培训对协处理器的语音识别

论文标题

培训对协处理器的语音识别

Training for Speech Recognition on Coprocessors

论文作者

Baunsgaard, Sebastian, Wrede, Sebastian B., Tozun, Pınar

论文摘要

近年来，自动语音识别（ASR）的受欢迎程度有所提高。处理器和存储技术的演变已实现了更先进的ASR机制，从而推动了Amazon Alexa，Apple Siri，Microsoft Cortana和Google Home等虚拟助手的发展。反过来，对此类助手的兴趣扩大了ASR研究的新发展。但是，尽管有这种知名度，但对现代ASR系统的培训效率分析尚未进行。这主要源于：许多依赖ASR的现代应用的专有性，如上所述；相对昂贵的处理器硬件，用于加速大型供应商的ASR来启用此类应用程序；以及缺乏完善的基准。本文的目的是解决这些挑战的后两个。本文首先描述了一种ASR模型，该模型基于一个深度神经网络，该网络受到该领域最近工作以及我们建立的经验的启发。然后，我们在代表不同预算类别的三个CPU-GPU协同处理器平台上评估了该模型。我们的结果表明，即使没有高端设备，使用硬件加速度也会产生良好的结果。虽然最昂贵的平台（最便宜的价格为10倍）将比其他两个速度快10-30％和60-70％的初始准确性目标收敛，但平台之间的差异几乎以稍高的精度目标消失。此外，我们的结果进一步强调了由于该领域模型培训的复杂，长和资源密集型性质而评估ASR系统的困难，以及为ASR建立基准的重要性。

Automatic Speech Recognition (ASR) has increased in popularity in recent years. The evolution of processor and storage technologies has enabled more advanced ASR mechanisms, fueling the development of virtual assistants such as Amazon Alexa, Apple Siri, Microsoft Cortana, and Google Home. The interest in such assistants, in turn, has amplified the novel developments in ASR research. However, despite this popularity, there has not been a detailed training efficiency analysis of modern ASR systems. This mainly stems from: the proprietary nature of many modern applications that depend on ASR, like the ones listed above; the relatively expensive co-processor hardware that is used to accelerate ASR by big vendors to enable such applications; and the absence of well-established benchmarks. The goal of this paper is to address the latter two of these challenges. The paper first describes an ASR model, based on a deep neural network inspired by recent work in this domain, and our experiences building it. Then we evaluate this model on three CPU-GPU co-processor platforms that represent different budget categories. Our results demonstrate that utilizing hardware acceleration yields good results even without high-end equipment. While the most expensive platform (10X price of the least expensive one) converges to the initial accuracy target 10-30% and 60-70% faster than the other two, the differences among the platforms almost disappear at slightly higher accuracy targets. In addition, our results further highlight both the difficulty of evaluating ASR systems due to the complex, long, and resource intensive nature of the model training in this domain, and the importance of establishing benchmarks for ASR.

下载PDF全文

下载文献需遵守相关版权规定

论文标题