论文标题
快速增长的自然合并中疾驰
Galloping in fast-growth natural merge sorts
论文作者
论文摘要
我们研究在基于合并的分类算法中合并常规的影响。更确切地说,我们专注于Timsort用来合并单调子阵列(以下称为Runs)的疾驰例程,以及如果使用此例程而不是幼稚的合并例程,则对执行的元素比较数量的影响进行。 引入了此例程,以使Timsort在几乎不同的值的数组上更有效。 las,我们证明,尽管它使Timsort排序数组在线性时间内具有两个值,但它并不能阻止Timsort最多需要$θ(n \ log(n))$ element $ element比较,以将长度〜$ n $的数组排序具有三个不同的值。但是,我们还证明,在最坏的情况下,当对$ n $的长度$ n $和$σ$不同的值进行排序时,在最坏情况下,略微修改了Timsort的疾驰的例程会导致仅需要$ \ MATHCAL {o}(n + n \ log(σ))$ element Compariess。 我们这样做是通过重点关注1990年代引入的双运行的概念以及相关的双跑步熵。该概念既与不同的值的数量以及数组中的运行数有关,该数组的数量带有其自己的运行长度熵,该熵用于解释Timsort的“超自然”效率。我们还引入了自然合并类型的快速和中等增长的新概念(即基于合并运行的算法),这些算法在类似于Timsort的几种合并排序算法中可以找到。 我们证明,只要它们使用我们使用蒂姆索特(Timsort)疾驰的例行程序的变体来合并运行的算法,在对具有低跑步诱导或双run诱导的复杂性的阵列排序阵列方面,尽可能有效。
We study the impact of merging routines in merge-based sorting algorithms. More precisely, we focus on the galloping routine that TimSort uses to merge monotonic sub-arrays, hereafter called runs, and on the impact on the number of element comparisons performed if one uses this routine instead of a naïve merging routine. This routine was introduced in order to make TimSort more efficient on arrays with few distinct values. Alas, we prove that, although it makes TimSort sort array with two values in linear time, it does not prevent TimSort from requiring up to $Θ(n \log(n))$ element comparisons to sort arrays of length~$n$ with three distinct values. However, we also prove that slightly modifying TimSort's galloping routine results in requiring only $\mathcal{O}(n + n \log(σ))$ element comparisons in the worst case, when sorting arrays of length $n$ with $σ$ distinct values. We do so by focusing on the notion of dual runs, which was introduced in the 1990s, and on the associated dual run-length entropy. This notion is both related to the number of distinct values and to the number of runs in an array, which came with its own run-length entropy that was used to explain TimSort's otherwise "supernatural" efficiency. We also introduce new notions of fast- and middle-growth for natural merge sorts (i.e., algorithms based on merging runs), which are found in several merge sorting algorithms similar to TimSort. We prove that algorithms with the fast- or middle-growth property, provided that they use our variant of TimSort's galloping routine for merging runs, are as efficient as possible at sorting arrays with low run-induced or dual-run-induced complexities.