First off, I’d like to congratulate the team on releasing wasm64 to mainnet at the end of 2024!
As the post mentions, after the wasm64 release, per-canister heap based memory is still limited 4GB. Many developers haven’t migrated to using 64-bit wasm yet, as much of the underlying data will double in size when moving from 32-bit to 64-bit.
I’ve heard through the grapevine that the plan is to raise heap memory to 6GB, 8GB, and then beyond, but I expect the big migration of developers from 32-bit to 64-bit memory will happen when 8GB of heap memory is available to canisters.
What are the expected timelines for reaching 6GB and 8GB of heap memory?
What can we expect to see 2 months from now, by mid-2025, and by end-2025 in terms of heap memory expansion? Is the timeline similar to the original expansion of stable-memory, or are there additional caveats to consider?
My personal interest in asking the questions above is that I’d like to store more fine-grained data on-chain in my Motoko canister, but want to leave enough breathing room in order to navigate around data migrations and potential surges in application usage.
For reference, my application currently stores ~200MB and I’d ideally to increase the historical data kept by a factor of 72. This means I’d be storing ~ 14GB with these adjustments. However, given ICP’s strong developer growth last year, I’d like to keep a ~5X storage buffer most of the time and know that 70GB+ heap will be available in the future, or at least know the expected rate of heap size increases.
One major difference from stable memory is that we can allow large stable memory, but still limit the amount of stable memory accessed in a given message. We don’t currently have the ability to limit per message heap memory access. This means if a few canisters are concurrently running and access a lot of heap memory they risk putting the node under memory pressure.
So when it comes to allowing much larger heaps it will be longer since it’s not just a question of bumping the numbers up.
Looking at the current sandboxes across subnets, I’d guess that the memory distribution of canisters is such that you have a just a few canisters with a significant amount of memory, and then a long tail of many canisters with small amount of memory.
I’d imagine this memory distribution is in part to be expected, but is also an artifact of the 4GB heap limitation and work done in projects before the expansion of stable memory & now Wasm64. Now, many applications such as KongSwap are returning to single canister architectures that experience more traffic and utilize larger amounts of memory. And if heap memory limit is raised in the coming months , I’d expect to see more applications use fewer canisters, but use larger amounts of memory and compute per canister.
While it sounds like extending heap memory to 100+GB requires significant work, what would the immediate impact of bumping canister heap memory up to 8GB be? Would we see an issue on any subnets, and what tuning/tweaking might need to be done afterwards? Are there any blockers for a simple doubling of canister heap memory?
On the motoko side, if we’re really gonna do a restructuring of base library and build some robust collection structures, this would be a good time to take a look at strategies for processing through very large heaps without having to access the equivalent amount of memory.
I don’t think we can answer that with certainty at the moment. The biggest risk is probably that if a lot of messages are executing and touching all 8GB of memory then it causes long rounds and significant degradation in subnet performance (e.g. lots of ingress messages timing out as with the BoB incidents a few months ago). The first step is to test scenarios like this to get an idea of the impact.
Does a single beefy canister vs. multiple canisters matter in this scenario? For example, what happens if you already have ~50 2GB canisters that are regularly receiving messages that touch sparse, different areas of memory wouldn’t this have the same effect as a single 100GB canister? Or is there a greater likelihood of this happening within a single canister, hence the issue.
Also, what’s the current cap on node RAM used for sandboxes?
I think there are two major differences with 50 canisters touching 2GB versus one canister touching 100GB.
With 50 canisters touching 2 GB per message we pause after each message and see what work was done before deciding if the round is complete. So if these canisters are doing a lot of work by accessing memory, the messages will get spread over many rounds which means other canisters will get the chance to run in between. With one large canister touching 100GB, we currently don’t have a good way of determining checking memory accesses mid-message and using that information to DTS the message across multiple rounds. So we potentially end up with a single very long round which prevents new messages from coming in.
This could be fixed if we can find an efficient way to deterministically calculate memory accesses mid-message.
We currently run 4 update threads and 4 query threads. With the 50 canisters using 2GB each, only 8 can be running at any one time, which means they’d only be using 16GB of memory together. But if one canister touches 100GB, then that means it’s using up 100GB of memory which potentially causes memory pressure and slows things down (there could also be 7 other canisters running simultaneously).
Currently there’s no hard cap set up, but we limit stable memory access to 4GB per message and the heap can be 6GB now, so generally the sandbox shouldn’t be using much more than 10 GB RAM.