论文标题
端到端自动语音识别的基于听觉的数据增强
Auditory-Based Data Augmentation for End-to-End Automatic Speech Recognition
论文作者
论文摘要
端到端模型在自动语音识别方面取得了重大改进。提高这些模型性能的一种常见方法是通过数据扩展扩展数据空间。同时,人类听觉启发的前端也证明了自动言语认可者的改善。在这项工作中,研究了一个基于验证的基于听觉的模型,该模型可以模拟各种听力能力,目的是为了进行数据扩展以进行端到端的语音识别。通过将听觉模型引入数据增强过程中,鼓励端到端系统忽略无法听到的信号的变化,从而侧重于稳健的语音识别功能。通过基于变压器的端到端模型,研究了听觉模型中的两种机制,即光谱涂抹和响度募集。结果表明,所提出的增强方法可以在最先进的规格上带来统计学上的显着改善。
End-to-end models have achieved significant improvement on automatic speech recognition. One common method to improve performance of these models is expanding the data-space through data augmentation. Meanwhile, human auditory inspired front-ends have also demonstrated improvement for automatic speech recognisers. In this work, a well-verified auditory-based model, which can simulate various hearing abilities, is investigated for the purpose of data augmentation for end-to-end speech recognition. By introducing the auditory model into the data augmentation process, end-to-end systems are encouraged to ignore variation from the signal that cannot be heard and thereby focus on robust features for speech recognition. Two mechanisms in the auditory model, spectral smearing and loudness recruitment, are studied on the LibriSpeech dataset with a transformer-based end-to-end model. The results show that the proposed augmentation methods can bring statistically significant improvement on the performance of the state-of-the-art SpecAugment.