I am a little confused on how to maximize the storage of a canister using Motoko.
My understanding is that canisters can store up to 4GiB of data in Heap Memory and 400 GiB in stable memory. However, declaring a stable structure such as a Trie in Motoko does not actually use stable memory, instead it uses heap memory and is therefore still limited to 4GiB in size. In order to use stable memory (and have access to 400 GiB) you have to make use of the Stable Regions API, which some people have developed data structures for.
Is my understanding correct? If so, what are the most common data structures people are using that are compatible with Stable Regions API? How much more expensive/inefficient is it to query from Stable Regions API vs a Motoko stable Trie vs a Motoko heap Hash map? Am I going to incur massive costs by storing all of my application data in stable data structures?
I have also heard that there is an update set to release very soon called “Enhanced Orthogonal Persistence” that will allow a Motoko stable Trie to store 400 GiB of data. What is the timeline on this?
My understanding is that canisters can store up to 4GiB of data in Heap Memory and 400 GiB in stable memory. However, declaring a stable structure such as a Trie in Motoko does not actually use stable memory, instead it uses heap memory and is therefore still limited to 4GiB in size. In order to use stable memory (and have access to 400 GiB) you have to make use of the Stable Regions API, which some people have developed data structures for.
That’s basically correct using the classic compiler without enhanced-orthogonal-persistence (EOP) enabled. The 4GiB limit is because the classic compiler targets Wasm32 (32-bit address space is 4GB max).
Is my understanding correct? If so, what are the most common data structures people are using that are compatible with Stable Regions API? How much more expensive/inefficient is it to query from Stable Regions API vs a Motoko stable Trie vs a Motoko heap Hash map? Am I going to incur massive costs by storing all of my application data in stable data structures?
I think it largely depends on the granularity of the data, unfortunately. For fine grained access, we expect stable regions to be much more expensive to use. But for coarse grained access, maybe not so much. There’s a few region based data-structures on mops, but not many.
https://mops.one/search/region
I can’t really vouch for any.
I have also heard that there is an update set to release very soon called “Enhanced Orthogonal Persistence” that will allow a Motoko stable Trie to store 400 GiB of data. What is the timeline on this?
EOP compiles to Wasm64 so supports a 64-bit address space that should scale to 400GB and beyond. That means your heap Trie will (eventually) be able to occupy much more than 4GB of space (at the cost that all Motoko values are now twice as big).
This is in beta and the IC is rolling out 64-bit wasm with low cap on memory of 4GB (or maybe 6GB) just to get comfortable with it before scaling to larger memory. The ultimate goal is to make the full 400GB available as main memory.
1 Like
Thank you for the response Claudio, really appreciate the help. Being able to access 400GiB per canister would greatly reduce the backend complexity of my application (save me from creating a canister factory and managing several canisters for the same data set due to 4 GiB space limitation).
In your opinion is it worth trying to use Stable Regions API so I can access 400GiB? Or is it too expensive/slow to be used on a canister which will be frequently queried and updated? Do you think I am just better off creating a sharded database with Trie structures with hopes that the Trie will be enhanced in the near future?
Thanks again for the help!
I think it very much depends on the data and the number of concurrent calls.
A single canister will always only provide sequential access to the data, so might not scale well to lots of users and a multi-canister solution might be more effective to distribute load (cf. GitHub - ORIGYN-SA/CanDB: CanDB - CanDB is a flexible, performant, and horizontally scalable non-relational multi-canister data storage framework built for the Internet Computer.).
If your data is very simple, say a map to large binary blobs of the same size, then using stable regions might be relatively straigthforward. You could even store the indexing structure (tree of keys) on the heap, but the large data values in a region. I think designing a fine grained data store in stable memory will be a lot of low-level work and might not perform very well.
But I really don’t have much personal experience of this (a compiler dev is probably the wrong person to ask about application architecture, since we seldom write big apps.)
Perhaps someone on the forum can offer more advice.
1 Like