There are two basic requirements for an implementation of stable vars:
Variable access must be (amortised) constant time, not time linear in the size of the data it holds. (Otherwise, it will be a gigantic performance/cost foot gun.)
It must be possible to reclaim stable memory no longer used, without irrecoverable fragmentation. (Otherwise it will be a space/cost trap for long-running apps.)
It is not obvious how to implement stable vars directly in stable memory with the current API such that these requirements are satisfied.
Also, keep in mind that we cannot afford changing the implementation of stable variables often, since every old version must be supported forever, and complexity of the system roughly increases quadratically with each change (and testable correctness worsens quadratically). In practice, I think that means we’ll only have one, at most two, more shots in production. Hence I’d be super-reluctant to introduce half-hearted solutions, even if it’s just a stop-gap measure.
I don’t see how the API matters, besides for performance: You could have the compiler emit the code it would if we had multi-mem and mem64 on the Wasm level, and then replace the reads and writes with the system API calls – either very naively access for access, or a bit cleverer with page or object level caching. It would be more instruction expensive, but it would give the right semantics, and that may be good enough for plenty of services already (we don’t hear much complaint about people hitting the per-message cycle limit).
For the sake of the argument, assume we had multi-mem amd 64 bit memory, and stable memory on the Wasm and IC level: How would we implement stable vars then? It would be great having a plan and knowing how much it would take – even if we then conclude that we don’t have the dev capacity right now, or that there is no reasonable way to shim that on top of the existing System API.
We’d need to implement some form of cross-memory GC, probably a sort of generalisation of a generational GC. I don’t think that a GC over stable memory, while technically possible, is viable with the System API.
There is a small detail that a shim cannot fully emulate: with the Wasm extensions, you can read values from stable memory or copy arbitrarily large blocks inside stable memory directly. A shim would always need to reserve a “sufficiently large” scratch area in regular memory to copy out and in again, which might get in the way of a specific heap layout, especially during GC.
I don’t think that a GC over stable memory, while technically possible, is viable with the System API.
No way to tell without trying :-). The kind of incremental GCs we are pondering for the IC have pretty good page-wise locality, don’t they? We want that anyways for normal GC… and then a page-level caching fake multi-memory might not be too bad. (Not too bad for me means just a few factors slower :-))
And even then, it would be good to have at least the design ready so that we are not rushed to implement something suddenly quickly once the system does provide the features and someone has promised on Twitter that all will be good now.
Ah, right, I forgot that the multi-memory extension also brings us bulk memory operations. Anyways, I would have imagined the fake-multi-mem instrumentation to anyways need some region in regular memory to do its thing (not a problem for us, the Motoko compiler can be helpful here, similar to the fake-multi-value-simulation now). Wouldn’t a single scratch page be enough, even for lager memcopies? If not, we can simply not use big bulk memory instruction in the compiler while we only simulate the support.
I agree that it would be good to have a design “ready”, but then there are plenty of other things that would be good to have yesterday.
As for cost, I’d expect 1-2 orders of magnitude cycle overhead using a shim. I don’t think that would be within the bounds of what even the most cost-insensitive devs would be happy to swallow. It would also vastly increase the likelihood of a method running into cycle limits.
Well, you could, but then 64 bit wouldn’t help much if you want a maintainable canister. The problem is the need to copy the entire heap to stable memory at every upgrade, which is expensive, and gets more expensive the larger the heap gets. In particular, it is likely to run out of cycles during upgrade long before you even hit the 4G limit. So you’d be stuck with an unupgradable canister.
Hi all, I would like to share the team’s current thinking on the topic of increasing canister smart contract memory. This is a topic we know is important to the community and some people have been reaching out asking for the current status.With this update, we want to manage everyone’s expectations around the timeline of further increasing canister smart contract memory.
As we had discussed in our rollout plan when the original proposal had been shared with you, we were going to start with bumping the limit for stable memory to 8GB and keep growing it as the system stabilizes. Based on the experience we have gathered running the production system since then, we have come to the conclusion that further increasing the limit for stable memory now (e.g. up to the subnet memory limit) provides marginal benefits to canisters while it opens up the opportunity for extra risks given some limitations of the current system.
Increasing the available stable memory of canisters can easily backfire if developers are not careful handling access to their bigger stable memory efficiently. A limit that is quite easy to hit is the instructions limit on canister upgrades (that is if you don’t manually store all data you need directly to stable memory, a task that is not trivial). Some of this will be alleviated by the upcoming DTS feature.
The stable memory api is still unsatisfactory as it requires copying data to and from wasm heap and can easily become quite expensive cycles-wise if large chunks of memory need to be moved around. Future work includes adding support for wasm64 and multiple memories on the IC and these should greatly help improve the story around handling of stable memory, including improving the performance of stable memory calls. We believe that these are important to do before we further increase the size of stable memory.
We believe that the work outlined above is important to be done before we attempt to further increase the available size of stable memory in order to provide developers with a safe and efficient platform to build on. As usual, we will keep the community posted on the progress of these necessary features, so please stay tuned.
I hate to ask this, but do you have any updates on the timeline for a possible wasm64 and multiple memories implementation?
I’m guessing there will be proposal beforehand, and judging by the fact that there hasn’t been one submitted yet, it seems like it’s still quite a few quarters away.
The uncertainty puts developers (i.e. myself) in somewhat of a bind. For example, I develop in Motoko and am not sure whether I should migrate all of my Motoko stable variables to using the ExperimentalStableMemory library. It’s still marked as experimental.
Memory-64 and multiple memories are extension proposals for Wasm itself, which have not yet been standardised, although they are already at phase 3 of the Wasm proposal process. AFAIK, wasmtime (the engine used by the replica) already implements them, but it would be risky to start depending on them before their standardisation has been finalised (which would be phase 4). And that process is outside Dfinity’s direct control, it is blocked on other engine vendors implementing the proposals.
The reason Motoko’s ExperimentalStableMemory library is still marked experimental is that we don’t really like it as a long term solution since it is rather low-level, hard to use, expensive and non-modular. A library is free to access it and can thus mess with any other libraries use of stable memory. It’s like a big global variable that everyone in the program has access to and thus goes against the grain of Motoko’s focus on safety.
A better design would be for the main actor to selectively hand out handles to isolated regions of stable memory that libraries could use, but only once given a capability to use them. That would, for example, prevent libraries messing with each others’ stable memory invariants. We don’t, off hand, have a good design for this, especially one that allows individuals regions to grow yet remain contiguous with efficient access.
I would not refrain from taking a dependency on the library if it suits your purpose, just be aware that the API may change some time in the future. That being said, I don’t see that happening anytime soon.
Also, be aware that the cost of accessing stable memory is quite high at the moment: even reads and writes of individual words require allocating space in the Wasm heap to do an expensive, particularly when small, bulk transfer from Wasm memory to stable memory, due to the peculiarities of the current System API.
We have some ideas on how replicas could lower these costs substantially, but at the moment they are just ideas and rely on the above Wasm proposals becoming fully available.
Note: for the context of my response I am referring to stable as using the stable keyword and not the ExperimentalStableMemory library)
Does this mean that declaring a stable variable in Motoko, and reading/updating that variable (or record fields within that variable) is much more expensive than keeping that variable unstable, but saving it via preupgrade/post upgrade methods?
For example, are there specific benefits to keeping a variable as unstable, but then persisting that variable to stable memory only during upgrades as opposed to always having that variable be stable?
Let’s say I have a List<Nat> data structure (which is stable). Are there any performance benefits to me initializing this list as an unstable variable (no stable keyword), but then using the system preupgrade/postgrads apis to save it to stable memory only during upgrades, vs. just initializing the List using the stable keyword from the start and keeping it that way (no additional steps needed for upgrades)?
If so, a pro-con list of the two scenarios (variable always stable vs. unstable but write to stable memory via system apis on upgrade) would be nice, along with estimated costs in terms of cycles and performance (i.e. a read heavy vs. write heavy application, or one that might get upgraded frequently, etc.).
At the moment, there is no performance difference between stable and ordinary variables during normal computation. The only difference is that stable variables are serialized and deserialized in bulk on upgrade, while ordinary variables are just discarded.
If we move to an implementation where stable variables are maintained in stable memory throughout their lifetime, rather than just between upgrades, then there may well be a performance tradeoff between the two sorts of variables.
In the meantime, I’m wondering if Motoko has incorporated any changes recently that can help developers use as much of the 4 GB canister memory as possible. For example, I believe the compacting GC was launched a while back, and some other changes may have landed since then.
Do you happen to know how much of the 4 GB we can use with stable variables before running into issues serializing/deserializing during upgrade? If that limit is, say, 3 GB, what would happen if we try upgrading a canister with more than 3 GB worth of stable variables?
These kind of scenarios (almost always involving some sort of canister upgrade) are a bit frightening to think about…
Motoko, by default, currently still uses the copying collector as that conveniently leaves us with half of wasm memory to perform naive serialization of stable variables, but limits us to roughly 2GB of live data. Our current serializer serialize stable variables to wasm memory before copying the blob to stable memory, which is suboptimal.
We are working to replace our current serializer so that it can serialize stable variables directly to stable memory in a streaming fashion, reducing the current 2x space overhead drastically. This should enable us to recommend or even default to the compacting GC. That should make much more than half of wasm memory available for live data.
The replica team is also working on allowing long running messages that span several rounds which should mitigate the current risk of running out of cycles during large GCs and upgrades.
That’s great. So even if we use the compacting GC right now, it won’t be that effective because the streaming stable variable feature hasn’t landed yet. Is that an accurate summary? Do you know if streaming is on the order of months or weeks away?
which should mitigate the current risk of running out of cycles
Do you mean the per-block instruction limit? I wasn’t aware that deterministic time cycles would actually save the user some cycles.
Do you mean the per-block instruction limit? I wasn’t aware that deterministic time cycles would actually save the user some cycles.
No, what Claudio meant is that a message will be able to run across multiple rounds (on each round consuming up to the limit allowed for a single execution), therefore being able to run longer overall and not be limited by the single message limit. In fact, running a long computation will likely cost a bit more in total but the key is you’ll actually be able to do it. More details in the relevant thread.
If you use the compacting GC right now, and the heap has lots of data, then you run the risk of not having enough heap space left to serialize your stable variables to the heap on the way to stable memory.
I think the streaming serialization feature will be out in Motoko in a small number of weeks - the PR is almost ready for review - when it is included in dfx is out of our hands though.
Deterministic Time Slices (running messages for multiple rounds, with higher budgets) is a bigger change so I expect a few months, rather than weeks.