论文标题
Pachash:包装和压缩的哈希表
PaCHash: Packed and Compressed Hash Tables
论文作者
论文摘要
我们介绍了Pachash,这是一张哈希表,即使对象具有可变大小,也将其对象连续存储在数组中,而无需干预空间。特别是,可以使用标准压缩技术压缩每个对象。较小的搜索数据结构允许将对象定位在恒定的预期时间。 Pachash最自然地描述为静态外部哈希表,在该表中需要每块外部内存的内部内存数量恒定。从某种意义上说,在这里,Pachash在K-Perfect Hashing的空间消耗中击败了一个下限。快速SSD的实现每块外部存储器的内存大约5位,每个搜索操作只需要一个磁盘访问(可变长度),并且与磁盘访问成本相比,内部搜索间接费用较小。我们的实验表明,即使考虑相同尺寸的对象,它的空间消耗也比以前的所有方法较低。
We introduce PaCHash, a hash table that stores its objects contiguously in an array without intervening space, even if the objects have variable size. In particular, each object can be compressed using standard compression techniques. A small search data structure allows locating the objects in constant expected time. PaCHash is most naturally described as a static external hash table where it needs a constant number of bits of internal memory per block of external memory. Here, in some sense, PaCHash beats a lower bound on the space consumption of k-perfect hashing. An implementation for fast SSDs needs about 5 bits of internal memory per block of external memory, requires only one disk access (of variable length) per search operation, and has small internal search overhead compared to the disk access cost. Our experiments show that it has lower space consumption than all previous approaches even when considering objects of identical size.