工程更快的分士针对小物品

论文标题

工程更快的分士针对小物品

Engineering Faster Sorters for Small Sets of Items

论文作者

Bingmann, Timo, Marianczuk, Jasper, Sanders, Peter

论文摘要

对一组项目进行排序是一项任务，它本身可以有用，也可以作为更复杂操作的构建块。这就是为什么要在找到尽可能快地对大型集合进行分类算法的付出的努力的原因。但是算法变得越复杂，越复杂，由于恒定的因素，它们对小项目的效率越小。我们旨在确定是否有一种比插入排序更快的方法来对小项目进行排序以提供更有效的基本案例分道机。我们研究了分类网络，它们如何提高排序元素的速度以及如何通过使用条件移动以有效的方式实现它们的速度。由于需要针对每个集合大小明确实现排序网络，因此由于代码尺寸增加，为较大尺寸的网络提供效率较低。为了启用稍大的基本外壳的排序，我们调整了样本排序以注册样品排序，以将这些较大的集合分解为尺寸，而这些尺寸又可以通过对网络进行排序来对其进行排序。从我们的实验中，我们发现，当仅分类小集合时，分类网络的插入分数至少超过1.76，对于六到十六的任何数组大小，在所有机器和阵列尺寸的平均值中，插入的插入量至少为1.76。当将排序网络作为基本案例分隔器集成到QuickSort中时，我们的性能改进要少得多，这可能是由于网络具有较大的代码大小并使L1指令缓存混乱。但是，对于X86机器的L1指令较大64 KIB或更多的机器，当使用排序网络用作STD :: STRAT中的基本情况下的分类器时，我们获得了12.7％的加速度。总之，只有在特殊情况下才能实现所需的速度提高，但是结果清楚地表明了在分类算法领域中使用条件移动的潜力。

Sorting a set of items is a task that can be useful by itself or as a building block for more complex operations. That is why a lot of effort has been put into finding sorting algorithms that sort large sets as fast as possible. But the more sophisticated and complex the algorithms become, the less efficient they are for small sets of items due to large constant factors. We aim to determine if there is a faster way than insertion sort to sort small sets of items to provide a more efficient base case sorter. We looked at sorting networks, at how they can improve the speed of sorting few elements, and how to implement them in an efficient manner by using conditional moves. Since sorting networks need to be implemented explicitly for each set size, providing networks for larger sizes becomes less efficient due to increased code sizes. To also enable the sorting of slightly larger base cases, we adapted sample sort to Register Sample Sort, to break down those larger sets into sizes that can in turn be sorted by sorting networks. From our experiments we found that when sorting only small sets, the sorting networks outperform insertion sort by a factor of at least 1.76 for any array size between six and sixteen, and by a factor of 2.72 on average across all machines and array sizes. When integrating sorting networks as a base case sorter into Quicksort, we achieved far less performance improvements, which is probably due to the networks having a larger code size and cluttering the L1 instruction cache. But for x86 machines with a larger L1 instruction cache of 64 KiB or more, we obtained speedups of 12.7% when using sorting networks as a base case sorter in std::sort. In conclusion, the desired improvement in speed could only be achieved under special circumstances, but the results clearly show the potential of using conditional moves in the field of sorting algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题