Abstract away the 4GB canister memory limit

The 4GB memory limit on canisters is one of the major complications I’m running into while building on top of the Internet Computer, especially while building fundamental infrastructural tools.

Orthogonal persistence is a beautiful dream, but the memory limit and the necessity of splitting data structures and apps across multiple canisters for scaling…somewhat hinders that dream.

I’m wondering if it would be possible for this limit to be abstracted away below the canister level, so that it is not exposed to developers. Considering that the limit has to be dealt with somewhere, it seems an explicit choice was made to let the developers deal with it. If we can deal with it as developers, then couldn’t the platform also deal with it, but perhaps in a more generalized manner?

If this is at all possible, I believe it would truly simplify so many things for developers for years to come.

I of course don’t know much about the internals of the system, but couldn’t something like virtual memory on our operating systems work for canisters? Instead of canisters directly accessing Wasm module memory, perhaps a virtual memory system could be created that automatically scales up and down according to the needs of canisters, and canisters could interact directly with that virtual memory system.

It would just be such a dream to create something like a HashMap, and just keep pumping values into it forever without any thought to 4GB limits and cross-canister calls and managing all of that craziness.

7 Likes

That limit is imposed by the state of affair in WebAssembly, and we might find a solution there, too, in the future. All of these proposals might allow a canister to store more than fits in to a single 32 bit memory somehow:

Although it will take a while until the IC supports a new Wasm feature, as we have to wait for all our dependencies that deal with Wasm to support it.

It seems like the limit could still be abstracted away by DFINITY. Are you saying that because this limit will likely be addressed by a future Wasm feature, DFINITY is choosing to wait for that and not to abstract it away in the meantime?

In the days of 32-bit operating systems, the 4GB process address space limitation could be worked around (e.g., spawn a separate process), but it was a pain. But that was merely a memory limit, you still had the whole hard drive to store data on. A 4GB limit for both memory and storage (persistent memory), if you require more than that, is even more of a challenge to deal with.

2 Likes

In the days of 32-bit operating systems, the 4GB process address space limitation could be worked around (e.g., spawn a separate process), but it was a pain. But that was merely a memory limit, you still had the whole hard drive to store data on.

Sounds like how I interpret the point of BigMap: to be the ‘hard drive’/filesystem where you have the ‘whole internet computer to store data on’. i.e. how the filesystem for that hard drive is a bit like a Map with keys like /path/to/file.

One important difference: a physical hard drive still has a relatively low size limit, and I/O limitations as long as its on the same physical disk. A distributed data structure on IC will likely have different performance characteristics/limitations than a hard drive, for better or worse.

1 Like

It would just be such a dream to create something like a HashMap, and just keep pumping values into it forever without any thought to 4GB limits and cross-canister calls and managing all of that craziness.

IMO everyone should depend on an abstract Map type that can be satisfied by a HashMap for now and BigMap when ready for primetime. The exact Map can be dependency injected with various implementations in the future (depending on tradeoffs desired, e.g. redundancy/cost/latency).

I don’t think BigMap works well for @lastmjs’s use case.

I guess I’m talking more about the development experience when implementing something like BigMap. There are many other scalable data structures we’ll need in the future for the IC to support many varying use cases, and the developers of each of those data structures will always have to deal with the canister limit. But, if the memory was scaled automatically for the user by the system, then that complexity would disappear for the developer to a large extent (hopefully).

3 Likes

I guess it’s expected that authors of low-level libraries (e.g. a BigMap like thing) will have to deal with certain complexities so that their users don’t have to – if there weren’t these complexities, you woudn’t need someone to write a library, right? :slight_smile:

Even if the system would allow you to scale your memory beyond 4GB (actually 8GB if you count the stable memory, which is in a way an anticipation of the world with multiple memories), this would still be limited by the capacity of a subnet. Only scaling across multiple canisters will give you something that could possibly be considered “infinitely scaleable”.

3 Likes