论文标题
对超级绘制的梁感知培训的实证研究
An Empirical Investigation of Beam-Aware Training in Supertagging
论文作者
论文摘要
结构化预测通常是通过训练具有最大似然的本地标准化模型来处理的,并且通过梁搜索大致解码。这种方法导致不匹配,因为在训练期间,模型不暴露于错误,也不使用光束搜索。 Beam Aware训练旨在解决这些问题,但不幸的是,由于缺乏对其影响性能的影响,何时最有用,以及它是否稳定,因此尚未广泛使用它。最近,Negrinho等。 (2018年)提出了一种元算象,该元算象捕获了光束感知的训练算法并提出了新的算法,但不幸的是没有提供经验结果。在本文中,我们开始了一项实证研究:我们训练Vaswani等人的超级壁式模型。 (2016年)和更简单的模型,具有对元叠加的实例化。我们探索各种设计选择的影响,并提出选择选择。我们观察到,横梁感知训练可以提高这两种模型的性能,对更简单的模型进行了巨大改进,这些模型必须有效地管理解码过程中的不确定性。我们的结果表明,必须通过搜索来学习模型,以最大程度地提高其有效性。
Structured prediction is often approached by training a locally normalized model with maximum likelihood and decoding approximately with beam search. This approach leads to mismatches as, during training, the model is not exposed to its mistakes and does not use beam search. Beam-aware training aims to address these problems, but unfortunately, it is not yet widely used due to a lack of understanding about how it impacts performance, when it is most useful, and whether it is stable. Recently, Negrinho et al. (2018) proposed a meta-algorithm that captures beam-aware training algorithms and suggests new ones, but unfortunately did not provide empirical results. In this paper, we begin an empirical investigation: we train the supertagging model of Vaswani et al. (2016) and a simpler model with instantiations of the meta-algorithm. We explore the influence of various design choices and make recommendations for choosing them. We observe that beam-aware training improves performance for both models, with large improvements for the simpler model which must effectively manage uncertainty during decoding. Our results suggest that a model must be learned with search to maximize its effectiveness.