关于通过样式蒙版语言模型的文本样式转移

论文标题

关于通过样式蒙版语言模型的文本样式转移

On Text Style Transfer via Style Masked Language Models

论文作者

Narasimhan, Sharan, Shekar, Pooja, Dey, Suvodip, Desarkar, Maunendra Sankar

论文摘要

文本样式传输（TST）可通过诸如潜在空间脱离，周期矛盾损失，原型编辑等方法执行。原型编辑方法（已知在TST中都非常成功）涉及两个关键阶段a）掩盖源样式相关的代币和b）对源式否定式否决的句子的重建条件。我们遵循类似的转导方法，在这种方法中，我们将更难的直接源将目标TST任务定位为更简单的屏蔽语言模型（SMLM）任务，其中类似于Bert \ Cite \ Cite {Bert}，我们的模型的目的是从其样式掩盖版本中重建源句子。我们通过在概率框架中制定原型编辑/转导方法来自然地到达SMLM机制，在该框架中，TST从部分观察到的平行数据集中估算假设的平行数据集估算一个假设的平行数据集，其中每个域都假定每个域具有共同的潜在样式掩盖的先验。为了生成此样式掩盖的先验，我们使用“可解释的注意”作为我们选择归因的选择，以进行更精确的样式掩盖步骤，并引入了一种经济高效且准确的“属性 - 折叠”方法来确定来自O（1）时间中任何任意属性模型的掩蔽位置。我们从经验上表明，这种非代理方法很好地适合TST这样的任务的“保留”标准，即使是针对诸如话语操纵之类的复杂风格。我们的模型，样式的MLM，表现优于强大的TST基准，并且与最先进的TST模型相提并论，TST模型使用复杂的架构和更多参数的顺序。

Text Style Transfer (TST) is performable through approaches such as latent space disentanglement, cycle-consistency losses, prototype editing etc. The prototype editing approach, which is known to be quite successful in TST, involves two key phases a) Masking of source style-associated tokens and b) Reconstruction of this source-style masked sentence conditioned with the target style. We follow a similar transduction method, in which we transpose the more difficult direct source to target TST task to a simpler Style-Masked Language Model (SMLM) Task, wherein, similar to BERT \cite{bert}, the goal of our model is now to reconstruct the source sentence from its style-masked version. We arrive at the SMLM mechanism naturally by formulating prototype editing/ transduction methods in a probabilistic framework, where TST resolves into estimating a hypothetical parallel dataset from a partially observed parallel dataset, wherein each domain is assumed to have a common latent style-masked prior. To generate this style-masked prior, we use "Explainable Attention" as our choice of attribution for a more precise style-masking step and also introduce a cost-effective and accurate "Attribution-Surplus" method of determining the position of masks from any arbitrary attribution model in O(1) time. We empirically show that this non-generational approach well suites the "content preserving" criteria for a task like TST, even for a complex style like Discourse Manipulation. Our model, the Style MLM, outperforms strong TST baselines and is on par with state-of-the-art TST models, which use complex architectures and orders of more parameters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题