论文标题

Transcormer:通过滑动语言建模进行句子评分的变压器

Transcormer: Transformer for Sentence Scoring with Sliding Language Modeling

论文作者

Song, Kaitao, Leng, Yichong, Tan, Xu, Zou, Yicheng, Qin, Tao, Li, Dongsheng

论文摘要

句子评分旨在衡量句子的可能性得分,并在许多自然语言处理方案中广泛使用,例如重读,这是从多个候选人中选择最佳句子。先前在评分句子上的著作主要采用因果语言建模(CLM),例如GPT或蒙版语言建模(MLM),例如BERT,它们具有一定的局限性:1)CLM仅利用单向信息来估算句子的概率,而无需考虑双向上下文,从而影响得分质量; 2)MLM一次只能估计部分令牌的概率,因此需要多次向前传递以估算整个句子的概率,从而降低了大量的计算和时间成本。在本文中,我们提出了\ textit {transcormer} - 具有新颖\ textit {滑动语言建模}(SLM)的变压器模型,用于句子得分。具体而言,我们的SLM采用了三流的自我发场机制来估计具有双向上下文的句子中所有令牌的概率,并且只需要单个正向段落。 SLM可以避免CLM的局限性(仅单向上下文)和MLM(多个正向通行证)并继承其优势,从而在评分方面具有很高的有效性和效率。对多个任务的实验结果表明,我们的方法比其他语言模型更好地达到了性能。

Sentence scoring aims at measuring the likelihood score of a sentence and is widely used in many natural language processing scenarios, like reranking, which is to select the best sentence from multiple candidates. Previous works on sentence scoring mainly adopted either causal language modeling (CLM) like GPT or masked language modeling (MLM) like BERT, which have some limitations: 1) CLM only utilizes unidirectional information for the probability estimation of a sentence without considering bidirectional context, which affects the scoring quality; 2) MLM can only estimate the probability of partial tokens at a time and thus requires multiple forward passes to estimate the probability of the whole sentence, which incurs large computation and time cost. In this paper, we propose \textit{Transcormer} -- a Transformer model with a novel \textit{sliding language modeling} (SLM) for sentence scoring. Specifically, our SLM adopts a triple-stream self-attention mechanism to estimate the probability of all tokens in a sentence with bidirectional context and only requires a single forward pass. SLM can avoid the limitations of CLM (only unidirectional context) and MLM (multiple forward passes) and inherit their advantages, and thus achieve high effectiveness and efficiency in scoring. Experimental results on multiple tasks demonstrate that our method achieves better performance than other language modelings.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源