# Minimal Perfect Hashing

• Minimal
• Perfect
• Hashing

## Hashing

function that takes a key to a fixed size value (often an integer)

## Perfect Hashing

hashing without collisions (injective)

## Minimal Perfect Hashing

• all possible keys are known in advance
• perfect hashing without gaps
• bijective mapping from n keys to 1..n

## Pros

• usually more space-efficient than most other approaches
• (especially if you are using this to create a hashtable)

## Cons

• upfront construction cost
• static
• ⚠️ produces nonsense results for absent keys ⚠️

## tl;dr

• if you’re not sure you need this, you probably don’t

# Approaches

## Not Discussed Today

• Hash, Displace, and Compress (CHD)
• Hypergraph peeling?

## Also Not Discussed Today

``minimalPerfectHash keys = Map.fromList \$ zip keys [0..]``

# How does it work?

## Requirements

• Family of hash functions
• Bitvectors supporting rank (and select)

## Family of hash functions?

• My understanding is that hash functions that take a salt should work
• Suspiciously similar to Bloom filters

## Bitvectors supporting rank and select?

• rank(i) is the number of 1s at or prior to index i (1-indexed)
• select(i) is the index n where the ith 1 is located (1-indexed)
• Look up “succinct data structures” for more fun with these

# How it works

## High level

1. Until we’ve reached max level or there are no more keys:
1. Turn the key into a number i ∈ [0..n)
2. Inspect bitvector[i]:
1. if 0 and no collision: bitvector[i] = 1
2. if 1: bitvector[i] = 0 and record collision
3. if 0 and collision: do nothing
3. Remove non-colliding keys
2. If there are leftover keys, put them in a hashtable

## Turning the key into a number

``````-- gamma is a multiplier for the size of the bitvector
value = hashWithSalt currentLevel key `mod` (gamma * currentLength)``````

## Populating the bitvector

1. Initialise two bitvectors B and C with 0s
2. When setting an index i:
1. If B[i] ≡ 0 and C[i] ≡ 0 then B[i] = 1
2. If B[i] ≡ 1 then B[i] = 0 and C[i] = 1
3. If B[i] ≡ 0 and C[i] ≡ 1 then do nothing

## Lookup

1. For each level:
1. Hash the key and check if the corresponding index is set
2. If so, find the rank
3. If not, increment the level count and repeat
2. Otherwise check the leftovers

# Example

• Bondi
• Tamarama
• Bronte
• Clovelly
• Gordons Bay
• Coogee

## Level 0

``````┌─┐
│0│ <- ["Clovelly","Bronte"]
├─┤
│1│ <- ["Gordons Bay"]
├─┤
│0│
├─┤
│0│
├─┤
│0│ <- ["Coogee","Tamarama"]
├─┤
│1│ <- ["Bondi"]
└─┘``````

## Level 1

``````┌─┐
│0│
├─┤
│0│
├─┤
│0│
├─┤
│0│ <- ["Coogee","Clovelly","Bronte","Tamarama"]
└─┘``````

## Level 2

``````┌─┐
│0│ <- ["Coogee","Clovelly","Bronte","Tamarama"]
├─┤
│0│
├─┤
│0│
├─┤
│0│
└─┘``````

## Level 3

``````┌─┐
│0│
├─┤
│0│ <- ["Coogee","Clovelly","Bronte","Tamarama"]
├─┤
│0│
├─┤
│0│
└─┘``````

## Level 4

``````┌─┐
│0│
├─┤
│0│
├─┤
│0│ <- ["Coogee","Clovelly","Bronte","Tamarama"]
├─┤
│0│
└─┘``````

## …

• I went ahead and ran this for 20 more levels and the keys kept colliding
• No slack in our bitvectors
• Let’s try with `gamma = 1.5`

## Level 0

``````┌─┐
│1│ <- ["Bronte"]
├─┤
│1│ <- ["Gordons Bay"]
├─┤
│0│
├─┤
│0│
├─┤
│0│ <- ["Coogee","Tamarama"]
├─┤
│0│
├─┤
│1│ <- ["Clovelly"]
├─┤
│0│
├─┤
│1│ <- ["Bondi"]
└─┘``````

## Level 1

``````┌─┐
│0│ <- ["Coogee","Tamarama"]
├─┤
│0│
├─┤
│0│
└─┘``````

## Level 2

``````┌─┐
│1│ <- ["Tamarama"]
├─┤
│1│ <- ["Coogee"]
├─┤
│0│
└─┘``````

## Lookup

`````` 0 1 2 3 4 5 6 7 8
┌─┬─┬─┬─┬─┬─┬─┬─┬─┐
│1│1│0│0│0│0│1│0│1│ b0
└─┴─┴─┴─┴─┴─┴─┴─┴─┘
└──────────── hashWithSalt 0 "Coogee" `mod` 9
┌─┬─┬─┐
│0│0│0│ b1
└─┴─┴─┘
└──────────────────── hashWithSalt 1 "Coogee" `mod` 3
┌─┬─┬─┐
│1│1│0│ b2
└─┴─┴─┘
└────────────────── hashWithSalt 2 "Coogee" `mod` 3``````

## Lookup

``````> hashWithSalt 0 "Coogee" `mod` 9
4
> hashWithSalt 1 "Coogee" `mod` 3
0
> hashWithSalt 2 "Coogee" `mod` 3
1
> popCount b0 + popCount b1 + rank b2 1
6``````

## False Positive

`````` 0 1 2 3 4 5 6 7 8
┌─┬─┬─┬─┬─┬─┬─┬─┬─┐
│1│1│0│0│0│0│1│0│1│ b0
└─┴─┴─┴─┴─┴─┴─┴─┴─┘
└────────────────── hashWithSalt 0 "Shelly" `mod` 9
┌─┬─┬─┐
│0│0│0│ b1
└─┴─┴─┘
┌─┬─┬─┐
│1│1│0│ b2
└─┴─┴─┘``````

## False Positive

``````> hashWithSalt 0 "Shelly" `mod` 9
1
> rank b0 1
2``````

## Minimal Perfect Hash Table

1. Create a values vector
2. values[hash(key)] = value
3. ???
4. PROFIT!!!