论文标题

伦巴第对双语扬声器的努力和英语的效果:光谱特征的重要性

Lombard Effect for Bilingual Speakers in Cantonese and English: importance of spectro-temporal features

论文作者

Scharf, Maximilian Karl, Hochmuth, Sabine, Wong, Lena L. N., Kollmeier, Birger, Warzybok, Anna

论文摘要

为了更好地理解语音感知的基本机制和不同信号特征的贡献,语音识别的计算模型在听力研究中具有悠久的传统。由于需要认识到语音的各种情况,因此这些模型需要在许多声学条件,扬声器和语言中都可以推广。与固定和调制噪声相比,这项贡献研究了不同特征对英语平原和伦巴第语音的语音识别预测的重要性。尽管广东话是一种音调语言,它在光谱特征中编码信息,但已知Lombard效应与语音信号的光谱变化有关。音调语言和伦巴第效应的这些对比属性构成了评估语音识别模型的有趣基础。在这里,使用经验数据评估了使用光谱或光谱时期特征的自动基于语音识别的ASR模型。结果表明,光谱时间特征对于预测粤语和英语中的说话者特定的语音识别阈值SRT $ _ {50} $以及在调制噪声中的语音识别的改善,而由于伦巴第语音引起的效果可以通过光谱特征预测。

For a better understanding of the mechanisms underlying speech perception and the contribution of different signal features, computational models of speech recognition have a long tradition in hearing research. Due to the diverse range of situations in which speech needs to be recognized, these models need to be generalizable across many acoustic conditions, speakers, and languages. This contribution examines the importance of different features for speech recognition predictions of plain and Lombard speech for English in comparison to Cantonese in stationary and modulated noise. While Cantonese is a tonal language that encodes information in spectro-temporal features, the Lombard effect is known to be associated with spectral changes in the speech signal. These contrasting properties of tonal languages and the Lombard effect form an interesting basis for the assessment of speech recognition models. Here, an automatic speech recognition-based ASR model using spectral or spectro-temporal features is evaluated with empirical data. The results indicate that spectro-temporal features are crucial in order to predict the speaker-specific speech recognition threshold SRT$_{50}$ in both Cantonese and English as well as to account for the improvement of speech recognition in modulated noise, while effects due to Lombard speech can already be predicted by spectral features.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源