Announcing Motoko package stable-trie

We published a new package https://mops.one/stable-trie

It is a key-value map that lives entirely and always in stable memory so that it is not limited by 4 GB heap and does not require serialization during canister upgrade. So upgrades are “free”.

Keys and values are constant size Blobs, whose lengths are configured in the constructor. The trie internally stores data in two Motoko Regions.

It is also efficient. Here is a profiling comparison against two heap data structures and one stable memory data structure. The third column is this package.

method rb tree zhus map stable trie map motoko stable btree
put 3_748 3_720 4_476 259_441
inside get 2_196 1_905 3_778 202_699
ouside get 1_605 1_089 2_325 209_641
inside deletion 5_034 2_152 10_548 445_004
outside deletion 4_148 1_085 2_364 406_710

Here, “inside” means that the key that is looked up or deleted is present in the tree.
“Outside” means that the key is not present (it’s a “miss”).

And here is an analysis of the memory utilization which can be obtained by running mops test large:

children number: 4
pointer size: 4
keys: 16_777_216
byte size: 327_693_280
bytes per key: 19
leaves: 16_777_216
nodes: 12_092_222
nodes per leaf: 0.720753
pointers per leaf: 2.883010
children per node: 2.387439
children utilization: 0.596860

This is a trie with 4 children per node and 16.7 million keys in it. The keys are 8 bytes long and uniformly distributed. The value size is 0, i.e. this trie is not actually a map but merely functions as a set. The trie has 12 million inner nodes, 16.7 million leaves and occupies a total of 327 MB. This amounts to 19.5 bytes per key. The 8 bytes of the key are part of those 19.5 bytes so we can say the trie overhead is 11.5 bytes per key. This overhead does not depend on the key size or on the value size. It turns out that it does no depend on the size of the trie (the number of keys in it) either as long as the keys are uniformly distributed.

A typical example use is to store keys that are 29 bytes long representing user principals. Then the storage requirement would be 11.5 + 29 = 40.5 bytes per user plus the value size stored for each user. The value size can be 0 for a set or any other constant value.

The above trie uses a pointer size of 4 bytes which allows to store up to 2 billion keys. If more are required then a pointer size of 5 bytes can be used. In this case the overhead will go up from 11.5 bytes to 14.4 bytes.

Let us know if you have use for this data structure!

11 Likes

How is stable-trie better than using Motoko Regions directly?

A trie is a key-value map that let’s you look up values by keys efficiently in O(log n) time. A Region is bare memory, that cannot offer lookup by keys.

4 Likes

This is great news. We are extensively using base:trie for storing our data. Will soon evaluate migration. Thank you for sharing :slight_smile: