We published a new package https://mops.one/stable-trie
It is a key-value map that lives entirely and always in stable memory so that it is not limited by 4 GB heap and does not require serialization during canister upgrade. So upgrades are “free”.
Keys and values are constant size Blobs, whose lengths are configured in the constructor. The trie internally stores data in two Motoko Regions.
It is also efficient. Here is a profiling comparison against two heap data structures and one stable memory data structure. The third column is this package.
method | rb tree | zhus map | stable trie map | motoko stable btree |
---|---|---|---|---|
put | 3_748 | 3_720 | 4_476 | 259_441 |
inside get | 2_196 | 1_905 | 3_778 | 202_699 |
ouside get | 1_605 | 1_089 | 2_325 | 209_641 |
inside deletion | 5_034 | 2_152 | 10_548 | 445_004 |
outside deletion | 4_148 | 1_085 | 2_364 | 406_710 |
Here, “inside” means that the key that is looked up or deleted is present in the tree.
“Outside” means that the key is not present (it’s a “miss”).
And here is an analysis of the memory utilization which can be obtained by running mops test large
:
children number: 4
pointer size: 4
keys: 16_777_216
byte size: 327_693_280
bytes per key: 19
leaves: 16_777_216
nodes: 12_092_222
nodes per leaf: 0.720753
pointers per leaf: 2.883010
children per node: 2.387439
children utilization: 0.596860
This is a trie with 4 children per node and 16.7 million keys in it. The keys are 8 bytes long and uniformly distributed. The value size is 0, i.e. this trie is not actually a map but merely functions as a set. The trie has 12 million inner nodes, 16.7 million leaves and occupies a total of 327 MB. This amounts to 19.5 bytes per key. The 8 bytes of the key are part of those 19.5 bytes so we can say the trie overhead is 11.5 bytes per key. This overhead does not depend on the key size or on the value size. It turns out that it does no depend on the size of the trie (the number of keys in it) either as long as the keys are uniformly distributed.
A typical example use is to store keys that are 29 bytes long representing user principals. Then the storage requirement would be 11.5 + 29 = 40.5 bytes per user plus the value size stored for each user. The value size can be 0 for a set or any other constant value.
The above trie uses a pointer size of 4 bytes which allows to store up to 2 billion keys. If more are required then a pointer size of 5 bytes can be used. In this case the overhead will go up from 11.5 bytes to 14.4 bytes.
Let us know if you have use for this data structure!