具有增强功能的语音摘要的有效上下文语言建模框架

论文标题

具有增强功能的语音摘要的有效上下文语言建模框架

An Effective Contextual Language Modeling Framework for Speech Summarization with Augmented Features

论文作者

Weng, Shi-Yan, Lo, Tien-Hong, Chen, Berlin

论文摘要

与语音信息相关的大量多媒体正在迫切需要开发有效有效的自动摘要方法。为此，我们在应用监督的基于深层神经网络的方法中进行了提取性语音摘要方面已经看到了快速的进步。最近，提出了来自变形金刚（BERT）模型的双向编码器表示，并在许多自然语言处理（NLP）任务（例如问题答案和语言理解）上取得了破纪录的成功。鉴于这一点，我们在本文中对语音摘要的基于最新的BERT模型进行环境化和增强，而其贡献至少为三倍。首先，我们探索将置信分数纳入句子表示形式，以查看这种尝试是否有助于减轻由不完善的自动语音识别（ASR）造成的负面影响。其次，我们还增加了从BERT获得的句子嵌入，具有额外的结构和语言特征，例如句子位置和倒数文档频率（IDF）统计。最后，与几种经典且著名的语音摘要方法相比，我们验证了我们提出的方法在基准数据集上的有效性。

Tremendous amounts of multimedia associated with speech information are driving an urgent need to develop efficient and effective automatic summarization methods. To this end, we have seen rapid progress in applying supervised deep neural network-based methods to extractive speech summarization. More recently, the Bidirectional Encoder Representations from Transformers (BERT) model was proposed and has achieved record-breaking success on many natural language processing (NLP) tasks such as question answering and language understanding. In view of this, we in this paper contextualize and enhance the state-of-the-art BERT-based model for speech summarization, while its contributions are at least three-fold. First, we explore the incorporation of confidence scores into sentence representations to see if such an attempt could help alleviate the negative effects caused by imperfect automatic speech recognition (ASR). Secondly, we also augment the sentence embeddings obtained from BERT with extra structural and linguistic features, such as sentence position and inverse document frequency (IDF) statistics. Finally, we validate the effectiveness of our proposed method on a benchmark dataset, in comparison to several classic and celebrated speech summarization methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题