Motoko stable memory in 2022

jzxchiang · January 22, 2022, 7:49pm

As of today, wasmtime has implemented:

bulk memory operations
memory64
multi-memory

The first is fully standardized. The latter two are in a late stage of standardization. IMO it’s only a matter of time (i.e. 1-2 years) before they are fully standardized.

In Motoko, stable memory is accessed either via stable variables or via the new ExperimentalStableMemory library.

The first is dev-friendly and fast, but requires annoying (and expensive) preupgrade and postupgrade hooks to copy data from IC’s stable memory to the canister wasm’s linear memory (and vice versa). Of course, that means this is limited by a canister wasm’s linear memory, which is currently 4 GB. (In reality, it’s less, closer to 2 GB or 3 GB depending on the Motoko GC you use during compilation.) These hooks can also trap if they hit a cycle limit, which is bad.

The second is safe and gives access to more storage, up to 300 GB supposedly. But it’s not dev-friendly at all. You have to manually convert high-level data structures like Tries into low-level reads and writes on primitives like Nat32 and Blob. It’s also slower at runtime due to System API calls, but as a result also requires no preupgrade and postupgrade hooks. Unfortunately, it’s also marked as experimental so probably not safe to rely on long-term.

I’m wondering what the plan is regarding stable memory in 2022, given the recent developments in wasmtime. I’m especially curious about Motoko, which is what I use.

Stable memory is really important. Almost all canisters will need to be upgraded at some point, so most data on the IC actually needs to be stored in stable memory.

My understanding from reading this thread is that the best way forward is to:

Store stable memory (which is 64-bit) in a separate 64-bit wasm memory, which is now possible due to wasmtime having implemented memory64 and multi-memory
Amend the System API (or other low-level IC interface) to allow that new wasm memory to cheaply “persist” to IC stable memory, without relying on expensive System API calls through the existing stable memory interface (e.g. ic0.stable64_read)
Implement cross-memory garbage collection, which is necessary in a high-level language like Motoko

Does that sound right? If so, is this on the roadmap for this year?

Dealing with storage on the IC is perhaps the biggest headache I’ve encountered while developing. It’d be great to have a more unified interface to stable memory, without having to worry about a bunch of gotchas (that are typically gleaned from some post hidden in some thread).

Thank you!

claudio · January 23, 2022, 2:17pm

I would say the first two points have always been on the roadmap in the sense that the system API was initially conceived with the intention to ultimately replace it by much more efficient multiple memories (and now 64-bit) multiple memories. I have no insight into the scheduling of that work though.

Regarding Motoko cross-memory garbage collection: that would be nice, if only it were that simple.

One of the challenges of stable memory is that is more than just some extra store: to be useful for upgrade, the format of the data it contains must be compatible across not only canister but compiler upgrades. It is for that reason than Motoko actually copies out the stable variable data to an (extension of) Candid instead of doing just a raw dump of the stable variable heap representation. We want to decouple the representation of the data at rest from the in-flight data representation so the compiler is free to evolve in memory representation as it likes, e.g. to adopt 64-bit pointers or change the object representation or whatever. So the honest answer is that we would love to do something better, have discussed it internally frequently, but don’t really know what that something is at the moment. In the short term, Motoko could certainly take advantage of native Wasm access to stable memory, for example, to improve the perf of ExperimentalStableMemory.mo and eventually offer something better.

(I do wonder how easy it would be for Rust to take much advantage of Wasm multiple memories, since it presumably requires LLVM and Rust code to be aware of multiple memories and produce Wasm instructions exploiting them.)

In the near term, there is other work afoot that will hopefully reduce the risk associated with running out of cycles during Motoko stable variable serialization in the upgrade hooks as well as plans to mitigate the space overhead of stable variable serialization which currently does more copying than it should (serializing into main memory before copying out to stable memory and vice versa). Multiple memory instructions would make optimizing serialization both easy and cheap.

Of course, another alternative would be to use stable memory for a more traditional in-memory database or file system that might be easier to expose to Rust and provide a more tried and trusted route to upgrade safety, by decoupling application state from the transient, run-time data representation that can and arguably should be able to change from one version of the compiler to the next. Orthogonal persistence is all very well, until you need to consider the ability to upgrade, where having an explicit, independent representation of the application state turns out to be very handy.

I’d love to hear suggestions about how to move forward, or pointers to existing work in this area.

jzxchiang · January 24, 2022, 11:51pm

Thanks, this is a helpful overview.

It is for that reason than Motoko actually copies out the stable variable data to an (extension of) Candid instead of doing just a raw dump of the stable variable heap representation.

Is the purpose of Candid here to check that a canister’s stable memory at some version A is compatible with its stable memory at some version B?

Let’s say I don’t change my Motoko code between A and B, but I do upgrade my moc canister. But moc changed the format of how stable variables in laid out in stable memory. So now I get a Candid warning when trying to upgrade. What can I even do about that? Is my only choice to downgrade my compiler version and file an issue on Github?

In the short term, Motoko could certainly take advantage of native Wasm access to stable memory, for example, to improve the perf of ExperimentalStableMemory.mo and eventually offer something better.

How different would that “something better” look like compared to ExperimentalStableMemory.mo? A big concern I have with relying on that library right now is that it’s marked “experimental”. I wonder what a new API would look like.

In the near term, there is other work afoot that will hopefully reduce the risk associated with running out of cycles during Motoko stable variable serialization in the upgrade hooks as well as plans to mitigate the space overhead of stable variable serialization which currently does more copying than it should (serializing into main memory before copying out to stable memory and vice versa).

Is this perhaps an advantage of splitting up a single canister into multiple canisters?

For example, if your one canister stores a bunch of “tables” (e.g. users, posts, topics), maybe it’s better to split that canister into 3 separate ones, one for users, one for posts, one for topics. Then, if you need to update the schema for a topic, you would only need to upgrade the topics canister, which would incur a smaller cycle cost, since only the stable memory for the topics canister needs to be serialized and deserialized. Also, each canister would get access to its own 4 GB (potentially up to 300 GB) of stable memory. Is that a fair assessment?

rossberg · January 25, 2022, 2:19pm

The reuse of Candid’s serialisation mechanism to save stable variables is just an implementation detail (and in retrospect, probably a bad one that will change eventually). It is not semantically observable by the programmer.

lastmjs · January 27, 2022, 1:58pm

Just a suggestion to not give up on the dream of orthogonal persistence, where developers can store very large amounts of data in data structures existing in the same memory their code runs in without hassle. I hope we can work towards the linear memory or heap being increased greatly and abstracting away the concept of stable memory. I don’t want to have to think about all of that as a developer (even a developer of libraries).

Is there an in-depth explanation of how memory is working on the IC? I don’t quite understand all of the limitations and how/why stable memory is necessary, the serialization required, etc. I’d love to dig in more to try and come up with solutions.

PaulLiu · January 27, 2022, 4:34pm

Fully agree with this. I recently wrote about my thoughts in designing the data schema for tipjar. Essentially I used mutually recursive data types instead of normalized tables like in a database.

Basically I don’t want to keep upgrading my canisters. In almost all other blockchains, smart contracts are designed to be immutable, and people seem to be fine with that. So give me linear memory (that can grow as needed) and I’d be fine not having to think about stable memory & upgrades.

rossberg · January 27, 2022, 8:03pm

@PaulLiu, if you never want to upgrade, then you indeed don’t have to bother with stable memory, even now.

jzxchiang · February 25, 2022, 2:01am

I totally agree with this sentiment. Orthogonal persistence was one of the main things that attracted me to the IC (from a purely programming perspective, not even blockchain related).

Fully agree with this. I recently wrote about my thoughts in designing the data schema for tipjar. Essentially I used mutually recursive data types instead of normalized tables like in a database.

Interesting, thanks for sharing. By “direct object references”, do you just mean nested objects? Like a User record directly contains an array of Allocation records? I wasn’t aware that could be mutually recursive.

Do you mean this would work?

type A = {
    b : B;
};

type B = {
    a : A;
};

(I would try this out but I’m on a phone right now.)

jzxchiang · February 25, 2022, 2:03am

Just to follow up on this, I believe I will hit memory limits using Motoko stable variables very soon. There are workarounds, but I wanted to ask…

What is your plan with ExperimentalStableMemory? Can I build on it? Will the interface change in the near future?

claudio · October 28, 2022, 2:15pm

(@Manu asked someone to follow up on this)

The library is marked experimental because its rather easy to shoot yourself in the foot. In particular, without coordination, separate libraries that import ExperimentalStableMemory can easily wind up trashing each others’ memory.

That said, I you can build on this if you are careful. The library will at most be replaced by something roughly similar but with better isolation guarantees.

Regarding the recent extension of stable memory limits from 8GB to 32GB:

The library already use 64-bit addresses so supports the 32GB stable memory limit out of the box.
However, in order to do that, you do need to tell the compiler how many stable memory pages (at most) to dedicate to ExperimentalStableMemory.mo using the --max-stable-pages <n> compiler flag. (With dfx, you can set this with the optional “args” string property of a motoko canister in the dfx.json file - this contains additional command line arguments to pass to the moc compiler during a build.)

By default, the compiler allows at most 65536 (64K) pages (4GB), reserving the remainder (previously 4GB, but now 28GB) for Motoko stable variable storage.

From https://github.com/dfinity/motoko-base/blob/master/src/ExperimentalStableMemory.mo:

Memory is allocated, using* grow(pages) *, sequentially and on demand, in units of 64KiB pages, starting with 0 allocated pages. New pages are zero initialized. Growth is capped by a soft limit on page count controlled by compile-time flag --max-stable-pages <n> (the default is 65536, or 4GiB).

…

NB: The IC’s actual stable memory size (ic0.stable_size) may exceed the page size reported by Motoko function* size() . This (and the cap on growth) are to accommodate Motoko’s stable variables. Applications that plan to use Motoko stable variables sparingly or not at all can increase --max-stable-pages as desired, approaching the IC maximum (currently 8GiB). All applications should reserve at least one page for stable variable data, even when no stable variables are used.

A tutorial sample using ExperimentalStableMemory is here (thought it doesn’t mention the compiler flag):

https://internetcomputer.org/docs/current/developer-docs/build/cdks/motoko-dfinity/stablememory/

(An example on passing (other) command line args to moc via dfx.json is here Stable-types build error moving to 0.9.3 and later versions of the SDK - #35 by claudio)

tomijaga · November 15, 2022, 8:13pm

@claudio Thanks for making this post! It’s very helpful.
I have a question about setting the --max-stable-pages of canisters created dynamically (from a parent canister). Do they take the max pages from their parent? Or is there a different way to set them

claudio · November 15, 2022, 11:16pm

Do you mean do the instances of an imported actor class receive the same --max-stable-pages setting as the importing actor.

Yes, I think that will be the case.

tomijaga · November 16, 2022, 9:00am

Yes, that’s what I mean, creating an instance of another canister.
Thank you!

skilesare · November 30, 2022, 1:51pm

Is there any hope of the stable variable in motoko being tied to the stable memory we are talking about here? I’m guessing the stable variable use case is still limited to the 8GB heap(and now we can use most of it because of the better streaming upgrade code)?

claudio · November 30, 2022, 8:47pm

Stable variables reside in the 4GB main heap in flight, and get copied to/from stable memory only during upgrade, so the 4GB limit still applies, I’m afraid.

You can now access up to 32GB (was 8GB) using stable memory, though you should reserve up to 4GB for any stable variables you may use.

jzxchiang · December 1, 2022, 1:37am

I wonder if it’s possible to make stable variables directly store their bytes in stable memory, given the new performance improvements in using the System API.

That way, memory type (i.e. heap, stable memory) is completely transparent to the developer.

rossberg · December 1, 2022, 7:42am

The overhead of storing stable data in stable memory is not just because of stable memory itself being more costly, it also is because a stable data layout is necessarily less efficient and flexible. In particular, storing something in a stable variable would then require not just moving a pointer, but a deep copy of the entire data each time, which can have extremely intransparent cost.

Moreover, stable data layout can never be changed, so, it would never be possible to tune memory layout, GC, etc. for it.

So, while it is possible in principle to store stable data in stable memory, in practice it would likely still be many times more costly, with no chance of future improvement.

jzxchiang · December 1, 2022, 7:59am

Dumb question, but why can stable data layout not be changed? And why can’t pointers be used?

I thought each canister had their own stable memory. Or is it because the heap can be wiped during canister upgrade but stable memory cannot? I think I’m missing something.

rossberg · December 1, 2022, 8:03am

The pointer cannot be used when it points into non-stable memory, because then the data wouldn’t be stable.

The layout cannot be changed because the whole point of stable memory is that it survives upgrades and remains usable afterwards. Since the upgraded code may have been compiled with an arbitrary future version of the Motoko compiler, that arbitrary future version needs to remain compatible with the old stable memory layout.

skilesare · December 1, 2022, 4:15pm

And we are still waiting on wasm to add 64 bit support so the heap can grow to more than 4GB right? Any movement on that front?

Topic		Replies	Views
32 GB Canister Storage Language Support Motoko , Wasm , Education	5	1449	November 9, 2022
Please tell me everything practical about stable memory Programs & Applications	5	1203	May 6, 2023
Two questions about canister storage Developers	30	3544	March 9, 2022
Trying to understand canister stable memory and how to fully leverage it in Motoko Developers	2	585	May 23, 2023
Save canister state while upgrading WASM of canister Developers Functional-Programming , Discussing	9	461	October 17, 2023

Motoko stable memory in 2022

Related topics