论文标题

预先培训的语音编码器的自我监督的重新布线:更快的微调,语音处理中的标签较少

Self-supervised Rewiring of Pre-trained Speech Encoders: Towards Faster Fine-tuning with Less Labels in Speech Processing

论文作者

Yang, Hao, Zhao, Jinming, Haffari, Gholamreza, Shareghi, Ehsan

论文摘要

预训练的语音变形金刚在各种语音处理任务中促进了巨大的成功。但是,对这些编码器进行下游任务进行微调需要足够大的培训数据,以收敛或实现最新的培训数据。在文本域中,这部分归因于预先训练的变压器中表示空间的亚典型性。在这项工作中,我们清醒地研究了预先训练的语音编码器并重新布线代表空间,而无需任何特定于任务的标签。我们的方法利用了中性合成的音频输入版本以及框架掩蔽来构建正对,以进行对比的自我监督学习。当用于增强WAV2VEC 2编码器时,我们会观察到表示空间中各向同性的一致改善。我们对6个语音处理任务进行的实验在任务微调过程中表现出显着的收敛速度以及一致的任务改进,尤其是在低资源设置中。

Pre-trained speech Transformers have facilitated great success across various speech processing tasks. However, fine-tuning these encoders for downstream tasks require sufficiently large training data to converge or to achieve state-of-the-art. In text domain this has been partly attributed to sub-optimality of the representation space in pre-trained Transformers. In this work, we take a sober look into pre-trained speech encoders and rewire their representation space without requiring any task-specific labels. Our method utilises neutrally synthesised version of audio inputs along with frame masking to construct positive pairs for contrastive self-supervised learning. When used for augmenting the wav2vec 2 encoder, we observe consistent improvement of isotropy in the representation space. Our experiments on 6 speech processing tasks, exhibit a significant convergence speedup during task fine-tuning as well as consistent task improvement, specially in low-resource settings.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源