论文标题

学习是否增强:基于神经网络的增强和观察到的信号的切换,以重叠语音识别

Learning to Enhance or Not: Neural Network-Based Switching of Enhanced and Observed Signals for Overlapping Speech Recognition

论文作者

Sato, Hiroshi, Ochiai, Tsubasa, Delcroix, Marc, Kinoshita, Keisuke, Kamo, Naoyuki, Moriya, Takafumi

论文摘要

基于深度神经网络(DNN)的语音增强(SE)前端和自动语音识别(ASR)后端的组合是实施重叠语音识别的一种广泛使用的方法。但是,SE前端产生的处理工件可以降低ASR性能。我们先前发现,即使在完全重叠的条件下,这种性能降解也可能发生,具体取决于信噪比(SIR)和信噪比(SNR)。为了减轻降解,我们引入了一种基于规则的方法,以在增强和观察到的信号之间切换ASR输入,这显示了有希望的结果。但是,该规则的最优性尚不清楚,因为它是启发式设计的,并且仅基于SIR和SNR值。在这项工作中,我们提出了一种基于DNN的切换方法,该方法直接估算ASR在增强或观察到的信号上的性能更好。我们还介绍了软转换,该软转换计算了ASR输入的增强和观察到的信号的加权总和,并用开关模型的输出后代给出了权重。提出的基于学习的切换显示出与基于规则的Oracle Switching相当的性能。与常规方法相比,软转换进一步提高了ASR性能,并实现了相对性格错误率最高23%。

The combination of a deep neural network (DNN) -based speech enhancement (SE) front-end and an automatic speech recognition (ASR) back-end is a widely used approach to implement overlapping speech recognition. However, the SE front-end generates processing artifacts that can degrade the ASR performance. We previously found that such performance degradation can occur even under fully overlapping conditions, depending on the signal-to-interference ratio (SIR) and signal-to-noise ratio (SNR). To mitigate the degradation, we introduced a rule-based method to switch the ASR input between the enhanced and observed signals, which showed promising results. However, the rule's optimality was unclear because it was heuristically designed and based only on SIR and SNR values. In this work, we propose a DNN-based switching method that directly estimates whether ASR will perform better on the enhanced or observed signals. We also introduce soft-switching that computes a weighted sum of the enhanced and observed signals for ASR input, with weights given by the switching model's output posteriors. The proposed learning-based switching showed performance comparable to that of rule-based oracle switching. The soft-switching further improved the ASR performance and achieved a relative character error rate reduction of up to 23 % as compared with the conventional method.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源