论文标题
小而强大:分裂和重塑的新基准测试
Small but Mighty: New Benchmarks for Split and Rephrase
论文作者
论文摘要
拆分和复制是将复杂句子重写为更简单的文本简化任务。作为一项相对较新的任务,要确保其评估基准和度量标准的合理性至关重要。我们发现,广泛使用的基准数据集普遍包含由自动生成过程引起的易于利用的句法提示。利用这种提示,我们表明,即使是简单的基于规则的模型也可以与最先进的模型一起执行。为了解决此类限制,我们收集和发布两个众包基准数据集。我们不仅要确保它们包含更加多样化的语法,而且还要根据定义明确的标准仔细控制其质量。尽管不存在令人满意的自动指标,但我们使用众包进行了基于这些标准进行精细的手动评估,这表明我们的数据集更好地代表了任务,并且对模型来说更具挑战性。
Split and Rephrase is a text simplification task of rewriting a complex sentence into simpler ones. As a relatively new task, it is paramount to ensure the soundness of its evaluation benchmark and metric. We find that the widely used benchmark dataset universally contains easily exploitable syntactic cues caused by its automatic generation process. Taking advantage of such cues, we show that even a simple rule-based model can perform on par with the state-of-the-art model. To remedy such limitations, we collect and release two crowdsourced benchmark datasets. We not only make sure that they contain significantly more diverse syntax, but also carefully control for their quality according to a well-defined set of criteria. While no satisfactory automatic metric exists, we apply fine-grained manual evaluation based on these criteria using crowdsourcing, showing that our datasets better represent the task and are significantly more challenging for the models.