Skip to main content Link Search Menu Expand Document (external link)

PaCHash: Packed and Compressed Hash Tables

DOI arXiv slides pdf code

Abstract

We introduce PaCHash, a hash table that stores its objects contiguously in an array without intervening space, even if the objects have variable size. In particular, each object can be compressed using standard compression techniques. A small search data structure allows locating the objects in constant expected time. PaCHash is most naturally described as a static external hash table where it needs a constant number of bits of internal memory per block of external memory. However, PaCHash can be dynamized and is also useful for internal memory, having lower space consumption than all previous approaches even when considering only objects of identical size. For example, in some sense it beats a lower bound on the space consumption of k-perfect hashing. An implementation for fast SSDs needs about 5 bits of internal memory per block of external memory, requires only one disk access (of variable length) per search operation and has internal search overhead small compared to the disk access cost.

BibTeX

@inproceedings{Kurpicz2023ALENEX,
  author    = {Florian Kurpicz and Hans-Peter Lehmann and Peter Sanders},
  title     = {PaCHash: Packed and Compressed Hash Tables},
  booktitle = {ALENEX},
  pages     = {162–175},
  publisher = {SIAM},
  doi       = {10.1137/1.9781611977561.ch14},
  year      = {2023}
}

PDF