通过多目标自回归预测编码改进的语音表示

论文标题

通过多目标自回归预测编码改进的语音表示

Improved Speech Representations with Multi-Target Autoregressive Predictive Coding

论文作者

Chung, Yu-An, Glass, James

论文摘要

最近已证明基于预测编码的培训目标非常有效地从未标记的语音中学习有意义的表示。一个例子是自动回归预测性编码（Chung等，2019），该编码训练自回归的RNN以产生看不见的未来框架，并且在诸如最近的过去框架之类的情况下。这些方法的基本假设是，可以准确预测未来帧的隐藏状态是许多下游任务的有用表示。在本文中，我们扩展了这一假设，并旨在通过训练模型以做出更准确的未来预测来丰富隐藏状态中编码的信息。我们提出了一个辅助目标，该目标是正规化，以改善对未来框架预测任务的概括。关于语音分类，语音识别和语音翻译的实验结果不仅支持该假设，而且还证明了我们方法在包含更丰富语音含量的学习表示方面的有效性。

Training objectives based on predictive coding have recently been shown to be very effective at learning meaningful representations from unlabeled speech. One example is Autoregressive Predictive Coding (Chung et al., 2019), which trains an autoregressive RNN to generate an unseen future frame given a context such as recent past frames. The basic hypothesis of these approaches is that hidden states that can accurately predict future frames are a useful representation for many downstream tasks. In this paper we extend this hypothesis and aim to enrich the information encoded in the hidden states by training the model to make more accurate future predictions. We propose an auxiliary objective that serves as a regularization to improve generalization of the future frame prediction task. Experimental results on phonetic classification, speech recognition, and speech translation not only support the hypothesis, but also demonstrate the effectiveness of our approach in learning representations that contain richer phonetic content.

下载PDF全文

下载文献需遵守相关版权规定

论文标题