What Motoko based data structure are people using to query user data by Principal ID in stable memory? I was thinking about using HashMap or RBTree, but apparently I am not allowed to use those structures for stable memory - what is the reason for that too? Anyway, if anyone has an efficient way to query user data by Principal ID please let me know, and if that is not the best method please let me know what you are using instead. Thanks
Have you looked into stable-trie by @timo?
There are other packages as well, maybe Motoko-hashtable is a good fit as well?
What exactly is your user data? To answer the question it is important to know whether it is constant size (same size for each user) or variable size (size differs from user to user). And also if user data can be overwritten later in a way that the size changes.
It is not clear to me from your question if you want a data structure that lives in stable memory (in Regions) or if you want one that lives in heap and gets serialized to stable memory during upgrades.
For the first kind (Regions) there are only two that I know of:
- https://mops.one/stable-trie
- GitHub - sardariuss/MotokoStableBTree: https://forum.dfinity.org/t/icdevs-org-bounty-24-stablebtree-mokoko-up-to-10k/14867 (doesn’t seem to be published on mops)
For the second kind there are many options. If you want that kind please respond and I can list more. For this kind you can also use standard RBTree that you mentioned. It has share/unshare functions that you can put in preupgrade/postupgrade.
The user data is not constant size and I would like the ability to change its size at any point. I was going to use a Trie but I ran into issues using the motoko Hash library (mo:base/Hash) to hash the Principal ID. Thanks for taking the time to try and help btw!
I wanted to use the Motoko Trie library (mo:base/Trie) at first. However, in order to create a balanced Trie with Principal ID being the key I needed to hash it first. I have run into issues using the Motoko Hash library on Principal IDs. And it seems like the only applicable Motoko based data structure that is stable memory compatible. So I am just unsure what to use now.
I feel like it would be best to just have a fully compatible stable memory data structure. BTW this data structure is supposed to just store basic user data that they choose to give the application. Like name, interests, username. Things like that.
What are your thoughts on: https://mops.one/stableheapbtreemap ?
In that case I usually give out consecutive numerical user ids (for internal use only, not exposed to the user) to the principals as they sign up. The map principal → user id is stored in a tree. The user data is then stored in a vector with user id as the index in the vector. That approach gives a lot of flexibility. The expensive trie lookup is performed only once, afterwards all accesses are fast vector accesses (array accesses). Some internal tasks, if the user id is already known, can even be performed without any tree lookup. The vector can be declared stable and will be serialized at upgrade without you having to do anything. For the tree map principal → user id you now have more options than before because the user id is constant size.
If you care about space waste after deletion of a user: an entry in the vector will remain unused forever if a user id isn’t used anymore. If you delete the data to which the vector entry points then you will only waste 4 bytes for the unused vector entry. That seems acceptable.
For the tree map principal → user id we have developed and used two data structures specifically for this purpose. Both offer the inverse map user id → principal as well (which you may need or not).
The characteristics of the first one:
- is a heap data structure
- has a share/unshare interface for persistence across upgrades
- is based on RBTree from base
- is space efficient because it offers both directions of the map but stores each principal only once
- has no deletion
- expected to work with high 10s of millions of user (maybe not 100s of millions)
The characteristics of the second one:
- is not a heap datastructure but lives directly in stable memory (Regions)
- no serialization needed at upgrades
- based on a trie implementation written from scratch
- comes in two versions: first version called “Map” has deletion but not the inverse map user id → principal, second version called “Enumeration” has the inverse map but does not offer deletion
- expected to work with billions of users
No deletion means you are going to store each principal forever. Often this is just ok.
That will work. It’s a heap datastructure but you can declare it stable and it will serialize automatically at upgrade time.
This is really great, exactly the response I needed. Thank you so much!