从原始音频中学习的自我监督语言学习：来自零资源演讲挑战的教训

论文标题

从原始音频中学习的自我监督语言学习：来自零资源演讲挑战的教训

Self-supervised language learning from raw audio: Lessons from the Zero Resource Speech Challenge

论文作者

Dunbar, Ewan, Hamilakis, Nicolas, Dupoux, Emmanuel

论文摘要

自我监督或无监督的机器学习的最新进展开放了可能在不使用任何文本表示或专家标签（例如音素，词典或解析树）的情况下从原始音频中构建完整的语音处理系统的可能性。自2015年以来，零资源语音挑战系列的贡献是将这一长期目标分解为四个定义明确的任务 - 声学单元发现，口语术语发现，离散的重新合成和口语建模 - 并引入相关的指标和基准标准，从而实现模型比较和累积进步。我们介绍了自2015年以来该挑战系列的六个版本的概述，讨论所学的教训，并概述需要更多工作或令人困惑的结果的领域。

Recent progress in self-supervised or unsupervised machine learning has opened the possibility of building a full speech processing system from raw audio without using any textual representations or expert labels such as phonemes, dictionaries or parse trees. The contribution of the Zero Resource Speech Challenge series since 2015 has been to break down this long-term objective into four well-defined tasks -- Acoustic Unit Discovery, Spoken Term Discovery, Discrete Resynthesis, and Spoken Language Modeling -- and introduce associated metrics and benchmarks enabling model comparison and cumulative progress. We present an overview of the six editions of this challenge series since 2015, discuss the lessons learned, and outline the areas which need more work or give puzzling results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题