# HAMTs from Scratch

*This blog post is also an IHaskell notebook and the source is available separately. I also did a talk at NYHUG based on this material.*

I wanted an explanation for HAMTs (Hash Array Mapped Tries) that was more detailed than Marek Majkowski’s introduction and more approachable than *Ideal Hash Trees* by Phil Bagwell, the paper that introduced them. If you haven’t heard of them before, HAMTs are a way of efficiently representing a hashtable as a trie, and although they were first envisioned as a mutable data structure they are easily adapted to work as a persistent data structure. They form the backbone of the `unordered-containers`

library but the implementation has been lovingly optimised to the point where I found it impenetrable. Edward Z. Yang’s implementation is much easier to follow and after adapting it I think I’m in a good place to provide my own take on them.

Let’s start with a few imports! I’ll be using these packages:

```
import Data.Bits (Bits (bit, complement, popCount, shiftR, (.&.), (.|.)),
FiniteBits (finiteBitSize))
import Data.ByteArray.Hash (FnvHash32 (..), fnv1Hash)
import Data.ByteString.Char8 (pack)
import Data.Char (intToDigit)
import Data.Semigroup ((<>))
import Data.Vector (Vector, drop, singleton, take, (!), (//))
import Data.Word (Word16, Word32)
import Numeric (showIntAtBase)
import Prelude hiding (drop, lookup, take)
import System.TimeIt (timeIt)
import Text.Show.Pretty (pPrint)
```

We’re going to be doing some bit twiddling. To make this easier to follow I’m going to define a `newtype`

whose `Show`

instance displays the binary representation.

```
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
newtype Binary a = Binary a
deriving (Enum, Ord, Real, Integral, Eq, Num, Bits, FiniteBits)
instance (FiniteBits a, Show a, Integral a) => Show (Binary a) where
show (Binary n) = let
= showIntAtBase 2 intToDigit n ""
str = finiteBitSize n
size in replicate (size - length str) '0' <> str
```

Using this `newtype`

we can turn this:

`24732 :: Word16`

`24732`

into this:

`24732 :: Binary Word16`

`0110000010011100`

I’m going to use 32-bit hashes (because they’re more convenient to display than 64-bit ones) and 16-bit bitmaps.

```
type Hash = Binary Word32
type Bitmap = Binary Word16
```

The width of bitmaps is 2^{n} where *n* is the number of bits of the hash that we use at each level of the tree (more on this below). I’m setting *n* = 4 which is what `unordered-containers`

uses (as of this writing), but we could e.g. set *n* = 5 and use 32-bit bitmaps if we wanted.

```
bitsPerSubkey :: Int
= 4 bitsPerSubkey
```

`Shift`

is a multiple of *n* that we will use to focus on the correct part of the hash.

`type Shift = Int`

I’m also going to define a `Hashable`

class to decouple the choice of a hash function from the implementation of `HAMT`

.

```
class Hashable a where
hash :: a -> Hash
```

For convenience, we’ll use the FNV-1 hash function with strings.

```
{-# LANGUAGE TypeSynonymInstances #-}
{-# LANGUAGE FlexibleInstances #-}
instance Hashable String where
= let
hash s FnvHash32 h = fnv1Hash (pack s)
in Binary h
```

Here’s what it looks like in practice.

`"1" :: Binary Word32 hash `

`00000101000011000101110100101110`

A HAMT can be

- empty (
`None`

) - a leaf node with the hash, the key, and the value (
`Leaf`

) - a node with a bitmap and a (non-empty) vector of child HAMTs (
`Many`

)

I’ve chosen to ignore the possibility of collisions, but we could handle them by adding an extra constructor, e.g. `Colliding`

with a hash and a vector of key-value pairs.

```
data HAMT key value
= None
| Leaf Hash key value
| Many Bitmap (Vector (HAMT key value))
deriving (Show)
empty :: HAMT k v
= None empty
```

We’ll need some helper functions for vectors:

`insertAt`

inserts an element at a specified index, shifting elements to the right forwards`updateAt`

replaces an element at a specified index with a new element`deleteAt`

removes an element at an index, shifting elements to the right backwards

```
insertAt :: Vector a -> Int -> a -> Vector a
index a = take index vector <> singleton a <> drop index vector
insertAt vector
updateAt :: Vector a -> Int -> a -> Vector a
index a = vector // [(index, a)]
updateAt vector
deleteAt :: Vector a -> Int -> Vector a
index = take index vector <> drop (index+1) vector deleteAt vector
```

### Insert

I think the bit manipulation functions are crucial to understanding what’s going on, so I’m going to motivate them by trying to define `insert`

without them and coming up with them as they are needed. This initial definition won’t be quite right so I’ll call it `insert_`

to differentiate it from the correct `insert'`

function I present later. The type signature for `insert_`

is

`insert_ :: Hash -> key -> value -> HAMT key value -> HAMT key value`

Inserting a key-value pair into an empty HAMT gives us a single leaf node:

`None = Leaf hash key value insert_ hash key value `

Inserting a key-value pair into a single leaf node where the hashes match gives us an updated leaf node (because we’re pretending collisions don’t exist):

```
Leaf leafHash leafKey leafValue)
insert_ hash key value (| hash == leafHash = Leaf hash key value
```

Inserting into a HAMT consisting of a single leaf node where the hashes don’t match upgrades that leaf node to a `Many`

node and inserts the key-value pair into that `Many`

node:

```
@(Leaf leafHash leafKey leafValue)
insert_ hash key value leaf| hash /= leafHash = insert_ key value (Many someBitmap (singleton leaf))
where someBitmap = undefined
```

#### Bit Masking

Where does `someBitmap`

come from? Time for an example! Let’s start with a `Leaf (hash "1") "1" 1`

:

```
= hash "1"
h = Leaf h "1" 1
leaf
leaf
```

`Leaf 00000101000011000101110100101110 "1" 1`

`someBitMap`

is a 16-bit bitmap where the number of bits set (the `popCount`

) is the length of the vector, which in this case is 1. We want to set one bit, but which bit? We carve off the last *n* bits using a mask:

```
subkeyMask :: Bitmap
= (bit bitsPerSubkey) - 1
subkeyMask
subkeyMask
```

`0000000000001111`

```
-- 0101110100101110
-- .&. 0000000000001111
-----------------------
-- 0000000000001110
= fromIntegral h .&. subkeyMask
fragment
fragment
```

`0000000000001110`

Then we interpret that fragment as a number:

```
Binary position = fragment
position
```

`14`

Finally, we set that bit and we have our bitmap:

```
someBitmap :: Bitmap
= Binary $ bit $ fromIntegral position
someBitmap
someBitmap
```

`0100000000000000`

We’re going to be doing this a lot, so I’ll define this as `bitMask_`

. The extra `_`

is because it isn’t quite right for the same reason as `insert_`

:

```
bitMask_ :: Hash -> Bitmap
= let
bitMask_ hash = fromIntegral hash .&. subkeyMask
fragment Binary position = fragment
in Binary (bit (fromIntegral position))
```

Let’s look at the `Many`

case. If we try inserting into a node where the bit in the bitmap corresponding to the mask is `0`

, this means that there is an empty slot in the vector. We can insert a leaf node into this slot and set the corresponding bit in the bitmap to `1`

:

```
Many bitmap vector)
insert_ hash key value (| bitmap .&. mask == 0 = let
= Leaf (hash key) key value
leaf = insertAt vector index leaf
vector' = bitmap .|. mask
bitmap' in Many bitmap' vector'
where
= bitMask_ hash
mask index = undefined
```

#### Mask Indexing

What `index`

do we use? This is where `popCount`

makes an appearance. Let’s demonstrate by inserting `("10", 2)`

into our example. First we get the mask corresponding to `hash "10"`

:

```
= bitMask_ (hash "10")
mask
mask
```

`0000010000000000`

Next we want to find the number of lower bits that have been set. We use `mask - 1`

as a mask:

`- 1 mask `

`0000001111111111`

```
-- 0100000000000000
-- .&. 0000001111111111
-----------------------
-- 0000000000000000
= someBitmap .&. (mask - 1)
masked
masked
```

`0000000000000000`

Then we count the number of bits set with `popCount`

:

```
index = popCount masked
index
```

`0`

And this is the index we need to insert at! We’ll call this `maskIndex`

:

```
maskIndex :: Bitmap -> Bitmap -> Int
= popCount (bitmap .&. (mask - 1)) maskIndex bitmap mask
```

The final case is where the bit in the bitmap is already set. We need to recursively update the HAMT at the corresponding index:

```
Many bitmap vector)
insert_ hash key value (| bitmap .&. mask == 1 = let
= insert_ hash key value (vector ! index) -- WRONG!
subtree' = updateAt vector index subtree'
vector' in Many bitmap vector'
where
= bitMask_ hash
mask index = maskIndex bitmap mask
```

But this definition is wrong, because instead of carving off the last *n* bits of `hash`

, we want to recursively carve off the next *n* bits!

#### Shifting

This is what’s missing from our definition, a `shift`

parameter corresponding to how far up the `hash`

we’re looking. This is why we defined `Shift`

above. Taking this extra parameter into account, our bit manipulation functions now become:

```
subkeyMask :: Bitmap
= (bit bitsPerSubkey) - 1
subkeyMask
maskIndex :: Bitmap -> Bitmap -> Int
= popCount (bitmap .&. (mask - 1))
maskIndex bitmap mask
subkey :: Hash -> Shift -> Int
= fromIntegral $ (fromIntegral $ shiftR hash shift) .&. subkeyMask
subkey hash shift
bitMask :: Hash -> Shift -> Bitmap
= bit (subkey hash shift) bitMask hash shift
```

We plumb through this `shift`

parameter, only modifying it in the final case, to give us the correct definitions of `insert'`

and `insert`

:

```
insert :: Hashable key => key -> value -> HAMT key value -> HAMT key value
= insert' 0 (hash key) key value hamt
insert key value hamt
insert' :: Shift -> Hash -> key -> value -> HAMT key value -> HAMT key value
None = Leaf hash key value
insert' shift hash key value
@(Leaf leafHash leafKey leafValue)
insert' shift hash key value leaf| hash == leafHash = Leaf hash key value
| otherwise = insert' shift hash key value (Many (bitMask leafHash shift) (singleton leaf))
Many bitmap vector)
insert' shift hash key value (| bitmap .&. mask == 0 = let
= Leaf hash key value
leaf = insertAt vector index leaf
vector' = bitmap .|. mask
bitmap' in Many bitmap' vector'
| otherwise = let
= vector ! index
subtree = insert' (shift+bitsPerSubkey) hash key value subtree
subtree' = updateAt vector index subtree'
vector' in Many bitmap vector'
where
= bitMask hash shift
mask index = maskIndex bitmap mask
```

Now we can construct HAMTs and inspect them! I’ll define a `fromList`

function and use `pPrint`

from `pretty-show`

to highlight the tree structure:

```
fromList :: Hashable key => [(key, value)] -> HAMT key value
= foldr (uncurry insert) empty
fromList
= fromList [("1", 1), ("10", 2), ("100", 3), ("1000", 4)]
example
pPrint example
```

```
Many
0100010000000000
[ Many
0000000100100000
[ Leaf 00100000011101101010111101011010 "10" 2
, Leaf 10001010111100101011011010001010 "1000" 4
]
, Many
0000001000000100
[ Leaf 00000101000011000101110100101110 "1" 1
, Leaf 01110100110101100000101010011110 "100" 3
]
]
```

### Lookup

Compared to `insert`

, `lookup`

is a walk in the park. It’s implemented along the same lines as `insert`

:

- on
`None`

nodes, it fails - on
`Leaf`

nodes, it succeeds if the hashes match - on
`Many`

nodes, it fails if the bit isn’t set, and recurses into the child node otherwise

```
lookup :: Hashable key => key -> HAMT key value -> Maybe value
lookup key hamt = lookup' 0 (hash key) hamt
lookup' :: Shift -> Hash -> HAMT key value -> Maybe value
None = Nothing
lookup' shift hash
Leaf leafHash leafKey leafValue)
lookup' shift hash (| hash == leafHash = Just leafValue
| otherwise = Nothing
Many bitmap vector)
lookup' shift hash (| bitmap .&. mask == 0 = Nothing
| otherwise = lookup' (shift+bitsPerSubkey) hash (vector ! index)
where
= bitMask hash shift
mask index = maskIndex bitmap mask
```

Let’s quickly confirm that it works.

`lookup "100" example`

`Just 3`

#### Memoising Fibonacci

We now have enough of an API to use this as a hashtable! Let’s use it to memoise the calculation of the Fibonacci sequence. The naive implementation does a lot of unnecessary recomputation:

```
fib :: Int -> Int
0 = 1
fib 1 = 1
fib = fib (n-1) + fib (n-2)
fib n
$ print $ fib 30 timeIt
```

```
1346269
CPU time: 1.31s
```

We can memoise it by storing previously calculated results and using them if they are available:

```
instance Hashable Int where
= Binary (fromIntegral int)
hash int
fib' :: HAMT Int Int -> Int -> (Int, HAMT Int Int)
0 = (1, insert 0 1 table)
fib' table 1 = (1, insert 1 1 table)
fib' table = case lookup n table of
fib' table n Just i -> (i, table)
Nothing -> let
= fib' table (n-1)
(i1, table') = fib' table' (n-2)
(i2, table'') in (i1 + i2, insert n (i1 + i2) table'')
fib :: Int -> Int
= fst $ fib' empty n
fib n
$ print $ fib 30 timeIt
```

```
1346269
CPU time: 0.00s
```

### Delete

Finally we come to `delete`

, which is only a little more complex than `lookup`

. It needs to make sure that no `Many`

node has a child `None`

node, so if a `None`

node:

- is an only child, it will replace the parent node
- has any sibling nodes, it will be removed from the parent node’s bitmap and vector

`Leaf`

nodes similarly replace their parents if they are the only child.

```
delete :: Hashable key => key -> HAMT key value -> HAMT key value
= delete' 0 (hash key) hamt
delete key hamt
delete' :: Shift -> Hash -> HAMT key value -> HAMT key value
None = None
delete' shift hash
@(Leaf leafHash leafKey leafValue)
delete' shift hash leaf| hash == leafHash = None
| otherwise = leaf
@(Many bitmap vector)
delete' shift hash many| bitmap .&. mask == 0 = many
| otherwise = let
= vector ! index
subtree = delete' (shift+bitsPerSubkey) hash subtree
subtree' in case subtree' of
None -> if length vector == 1
then None
else Many (bitmap .&. complement mask) (deleteAt vector index)
Leaf{} -> if length vector == 1
then subtree'
else Many bitmap (updateAt vector index subtree')
Many{} -> Many bitmap (updateAt vector index subtree')
where
= bitMask hash shift
mask index = maskIndex bitmap mask
```

Let’s see this in action.

`$ delete "1000" example pPrint `

```
Many
0100010000000000
[ Many
0000000000100000 [ Leaf 00100000011101101010111101011010 "10" 2 ]
, Many
0000001000000100
[ Leaf 00000101000011000101110100101110 "1" 1
, Leaf 01110100110101100000101010011110 "100" 3
]
]
```

It’s possible to have a situation where we have a `Many`

node with only one child, because our replacement behaviour checks the length of the vector before any elements are removed from it. However, removing the last leaf will correctly delete the parent node.

`$ delete "10" $ delete "1000" example pPrint `

```
Many
0100000000000000
[ Many
0000001000000100
[ Leaf 00000101000011000101110100101110 "1" 1
, Leaf 01110100110101100000101010011110 "100" 3
]
]
```

And we’re done! I hope you understand HAMTs better than when you started reading this.

If you want to use this for something other than educational purposes, I would recommend adding logic to deal with hash collisions, which I intentionally omitted. There’s also some low-hanging fruit in terms of performance optimisations. The first thing that comes to mind is an additional `Full`

constructor for the case where all bits in the bitmap are set, and the next thing is the use of unsafe vector functions that omit bounds checking.

Thanks to Evan Borden, Javier Candeira, Jean Niklas L’orange, Mark Hopkins, and Tim Humphries for comments and feedback.