Ic-stable-memory rust library

simdi.jinkins · July 15, 2022, 6:22am

Wow, this some really dope work, might completely change my strategy of doing things currently, thanks alot. Really interested in how things work under the hood

senior.joinu · August 5, 2022, 3:48pm

v0.2.0

ic-stable-memory has been promoted to v0.2.0 with huge update and breaking changes!

Breaking changes

1. Bye Candid, hello Speedy

Speedy is a serialization library which is now used by ic-stable-memory, instead of Candid.
This has some implications: both good and bad.

Bad:

Developers are now obligated to use both Candid and Speedy in their canisters, if they want to use ic-stable-memory, which would have a negative impact on module size.
This is a breaking change, so you have to make your stuctures stored in stable memory compatible with Speedy. This is pretty easy though - just switch CandidType and Deserialize derive macros to speedy::Readable and speedy::Writable.
Now the Motoko version of ic-stable-memory would be very hard to implement. I didn’t have a plan to do it anyway, but now, without Candid, it would take too much effort to even try.

Good:

Speedy is a lot (up to x10 times in some benchmarks) faster than serde and has a better memory footprint (does not store DIDL magic string or anything like this) than Candid. In other words, it just works better for storage-related operations (but Candid is still top choice when it comes to networking and interoperability).
IMO Speedy is easier to work with. It has the same kind of derive macro system, but It is easier to implement these macros by hand, when you really need it for some reason.

2. “Memory low” handler system

Previously, every function of ic-stable-memory allocating new stable memory returned Result<T, OutOfMemory>. You could use this return value in order to understand whether the canister’s memory is over or not. This worked bad for two reasons:

Poor API. Basically, you had to append .unwrap_or_else(...) to each allocating method of any stable collection (e.g. svec.push()) in order to catch this OutOfMemory error. The code looked bloated with this garbage, providing awful readability.
On my side, I had to implement stable collections in a transactional fashion, so each method, if it throws a error, would reset to the previous state. For some collections this was really easy to achieve, but for some others (like a btree map) this was a pain.

Now I removed this OutOfMemory from a user-faced API completely. Instead, the library will always make sure, that you have some stable memory available. Once your canister reaches the memory limit (but it still has some free memory available), a special user-defined handler update function on_low_stable_memory() will be called. In the body of this function you can do anything you want: spawn a new canister, send a e-mail to the developer and so on. Once the memory is really over - the library will just trap, keeping the state safe from corruption.

As a breaking change, this will require some work from you to support. All you have to do is to move your OutOutMemory reaction logic to on_low_stable_memory() and to clean your code from every unwrap() or expect() that were following your stable collections functions invocations.

The Github readme file now has a section about this mechanism.

Other updates

SBtreeMap and SBtreeSet are now available as stable collections. They are pretty slow at the moment, but they work. Total list of supported collections: SVec, SBinaryHeap, SHashMap, SHashSet, SBTreeMap and SBTreeSet.
It is possible to use ic-stable-memory with stable Rust channel (nightly is no longer a requirement).
Optimizations. All stable collections (except for BTree-based ones) are now working 20-40 times faster than before (the repo also contains benchmarks now).
Stability improvements.

What’s next

Update tutorials in order to reflect these new details (UPD: done).
Focus on stability and performance. The goal is to achieve a state, where it is economically reasonable to use ic-stable-memory instead of the standard approach with horizontal scaling and pre_upgrade/post_upgrade routine.

dymayday · August 5, 2022, 10:00pm

Wow improvements look super nice ! I’ll definitely give this lib a try for a new storage project soon !

One question tho : what was the incentive to use speedy over another serializer like Message Pack (which Distrikt, OpenChat and Dcvr already use extensively) for example ? Which gives around the same performance improvements in my findings.

senior.joinu · August 5, 2022, 10:15pm

Great to hear that, thanks!

I used benchmarks as a main source of info for this decision making session. For example, this one, says that speedy is much faster than rmp and has a better memory footprint.

I like this benchmark repo, because they compare against Abomonation. Which I find very reasonable, since you just can’t make a faster thing than Abomonation.

But, like the song goes: in the end it doesn’t even matter. Any storage-focused library would work just fine and will get its job done. Thanks for your feedback on that one!

diegop · August 6, 2022, 3:43am

I’ve been playing with Rust stable memory for some sample dapps, so very eager to add this to my list. Thanks!

saikatdas0790 · August 6, 2022, 5:33am

Definitely trying this.

Thank you for enabling rust stable channel support.

saikatdas0790 · August 6, 2022, 11:18am

@senior.joinu I tried it and ran into this bug.

Opened an issue for it here

There’s a minimal reproduction in the issue. Let me know if there’s any other detail I can provide

senior.joinu · August 7, 2022, 11:37am

Hey there!

Fixed. Check the issue for more details.

saikatdas0790 · August 7, 2022, 2:15pm

Hi @senior.joinu I ran into another issue with SPrincipal. Created a separate issue for it here with a minimal reproduction

Let me know if there’s any further details I can provide

senior.joinu · August 8, 2022, 9:27am

Hello again!

New version of the library with this issue fixed is available.
Thanks!

saikatdas0790 · August 8, 2022, 11:11am

Thank you. Works fine as far as I’ve tested.

senior.joinu · October 24, 2022, 9:02pm

0.4.0-rc1

Hey there! Huge stability, performance & other updates here.
ic-stable-memory 0.4.0 release candidate #1 is now published.

The goal for this version was to improve the library so it would be reasonable to use it instead of standard collections. Previous versions had a lot of problems with performance (some collections was 1.5k times slower than the standard ones). But this time everything is much better.

Cons

First, let’s talk about stuff that some might find upsetting.

Collections from 0.4.0-rc1 are not backwards compatible with older ones. This is because they now internally use a custom super-fast (up to 5 times faster than Speedy) serialization engine and because of that they store data differently. If you need a migration guide, please let me know (but I hope that most people simply don’t use ic-stable-memory in production).
ic-stable-memory is now a nightly only library, because it strongly relies on a couple of unstable features, namely: generic_const_exprs and thread_local. So now, in order to use the library you have to install rust 1.66 nightly toolchain and also append these lines to the beginning of your lib.rs:

#![feature(thread_local)]
#![feature(generic_const_exprs)]

ic-stable-memory no longer takes care of horizontal scaling - I found out that developers themselves should be able to do a much better job doing it for their exact use-case. So max_allocation_pages and max_grow_pages settings are removed as well as on_low_stable_memory hook. See this thread for more info on how to enable horizontal scaling for your canister.
Most collections (except SBTreeMap and SBTreeSet) are no longer “infinite” - they are now limited to 2**32 elements total. I’m planning on adding an “infinite” SVec back in a near future.

Pros

Performance improvements

Vec

Before

"Classic vec push" 1000000 iterations: 463 ms
"Stable vec push" 1000000 iterations: 22606 ms (x49 slower)

"Classic vec pop" 1000000 iterations: 406 ms
"Stable vec pop" 1000000 iterations: 11338 ms (x28 slower)

"Classic vec search" 1000000 iterations: 127 ms
"Stable vec search" 1000000 iterations: 2926 ms (x23 slower)

After

"Classic vec push" 1000000 iterations: 46 ms
"Stable vec push" 1000000 iterations: 212 ms (x4.6 slower)

"Classic vec search" 1000000 iterations: 102 ms
"Stable vec search" 1000000 iterations: 151 ms (x1.4 slower)

"Classic vec pop" 1000000 iterations: 48 ms
"Stable vec pop" 1000000 iterations: 148 ms (x3 slower)

Hash map

Before

"Classic hash map insert" 100000 iterations: 224 ms
"Stable hash map insert" 100000 iterations: 7199 ms (x32 slower)

"Classic hash map remove" 100000 iterations: 123 ms
"Stable hash map remove" 100000 iterations: 3618 ms (x29 slower)

"Classic hash map search" 100000 iterations: 69 ms
"Stable hash map search" 100000 iterations: 2325 ms (x34 slower)

After

"Classic hash map insert" 100000 iterations: 96 ms
"Stable hash map insert" 100000 iterations: 387 ms (x4 slower)

"Classic hash map search" 100000 iterations: 47 ms
"Stable hash map search" 100000 iterations: 113 ms (x2.4 slower)

"Classic hash map remove" 100000 iterations: 60 ms
"Stable hash map remove" 100000 iterations: 99 ms (x1.6 slower)

Hash set

Before

"Classic hash set insert" 100000 iterations: 209 ms
"Stable hash set insert" 100000 iterations: 5977 ms (x28 slower)

"Classic hash set remove" 100000 iterations: 180 ms
"Stable hash set remove" 100000 iterations: 2724 ms (x15 slower)

"Classic hash set search" 100000 iterations: 125 ms
"Stable hash set search" 100000 iterations: 2007 ms (x16 slower)

After

"Classic hash set insert" 100000 iterations: 79 ms
"Stable hash set insert" 100000 iterations: 394 ms (x4.9 slower)

"Classic hash set search" 100000 iterations: 54 ms
"Stable hash set search" 100000 iterations: 97 ms (x1.8 slower)

"Classic hash set remove" 100000 iterations: 56 ms
"Stable hash set remove" 100000 iterations: 99 ms (x1.7 slower)

BTree map

Before

"Classic btree map insert" 10000 iterations: 31 ms
"Stable btree map insert" 10000 iterations: 8981 ms (x298 slower)

"Classic btree map remove" 10000 iterations: 17 ms
"Stable btree map remove" 10000 iterations: 19831 ms (x1166 slower)

"Classic btree map search" 10000 iterations: 15 ms
"Stable btree map search" 10000 iterations: 20710 ms (x1380 slower)

After

"Classic btree map insert" 100000 iterations: 267 ms
"Stable btree map insert" 100000 iterations: 17050 ms (x63 slower)

"Classic btree map search" 100000 iterations: 138 ms
"Stable btree map search" 100000 iterations: 566 ms (x4.1 slower)

"Classic btree map remove" 100000 iterations: 147 ms
"Stable btree map remove" 100000 iterations: 1349 ms (x9.1 slower)

BTree set

Before

"Classic btree set insert" 10000 iterations: 26 ms
"Stable btree set insert" 10000 iterations: 8920 ms (x343 slower)

"Classic btree set remove" 10000 iterations: 13 ms
"Stable btree set remove" 10000 iterations: 19601 ms (x1507 slower)

"Classic btree set search" 10000 iterations: 16 ms
"Stable btree set search" 10000 iterations: 20569 ms (x1285 slower)

After

"Classic btree set insert" 100000 iterations: 312 ms
"Stable btree set insert" 100000 iterations: 1771 ms (x5.6 slower)

"Classic btree set search" 100000 iterations: 170 ms
"Stable btree set search" 100000 iterations: 600 ms (x3.5 slower)

"Classic btree set remove" 100000 iterations: 134 ms
"Stable btree set remove" 100000 iterations: 1317 ms (x9.8 slower)

Binary heap

Before

"Classic binary heap push" 1000000 iterations: 995 ms
"Stable binary heap push" 1000000 iterations: 29578 ms (x29 slower)

"Classic binary heap pop" 1000000 iterations: 4453 ms
"Stable binary heap pop" 1000000 iterations: 27159 ms (x6 slower)

"Classic binary heap peek" 1000000 iterations: 133 ms
"Stable binary heap peek" 1000000 iterations: 3314 ms (x25 slower)

After

"Classic binary heap push" 1000000 iterations: 461 ms
"Stable binary heap push" 1000000 iterations: 11668 ms (x25 slower)

"Classic binary heap peek" 1000000 iterations: 62 ms
"Stable binary heap peek" 1000000 iterations: 144 ms (x2.3 slower)

"Classic binary heap pop" 1000000 iterations: 715 ms
"Stable binary heap pop" 1000000 iterations: 16524 ms (x23 slower)

As you can see, in some cases performance gains are obvious, but for some (like SBinaryHeap) there is not much changed. This is because some algorithms are sensitive to indirect data and some - don’t. Let me explain what I mean.

Direct & indirect storage

Previously, all collections in ic-stable-memory were indirect - for each element they were allocating a new portion of stable memory to store it, but inside they were only storing a pointer to that stable memory. This is what allowed these collections to store any kind of dynamically sized data inside and to be able to migrate this data to new version (to add new fields) on-the-fly. But this also was a big problem from the performance perspective, since such a flow requires (de)allocating and (de)serializing each element.

This is why now all stable collections are optionally indirect. Now, there is a special trait StableAllocated. This trait should only be implemented for fixed-size data (for something that can easily implement Copy). If your data implements this trait, you can pass it straight to a stable collection like so:

let vec = SVec::<MyData>::new();

and the collection will work in high-performance direct mode, storing data inline, right inside the collection. In this case your data will be serialized with the custom serialization engine, that I’ve mentioned before. This engine serializes everything without leaving the stack, straight to a fixed-size byte array. Which makes it a lot faster than any other serialization engine, since all of them are serialize into a heap-allocated buffer.

But if your data is dynamically-sized (or you have a plan to update this data to a new version one day, adding new fields) than you have to store it indirectly. For that there is a couple of new smart-pointers: SBox and SBoxMut:

let vec = SVec::<SBox<MyData>>::new();

In that case, the collection will, as before, allocate a new portion of stable memory somewhere else and internally will only store a pointer to that data. You can wrap any data that implements speedy::Readable and speedy::Writable in SBox or SBoxMut - speedy is responsible for the serialization is this case.

This indirection polymorphism feature was the longest thing to implement for me and this is why performance gains might not yet look like a big deal for some of you. Internally collections are changed drastically. This will allow for more better optimizations in the future, so don’t get upset - from now the library will only become faster and faster.

Stability

Test coverage is increased up to 95% and in reality it is a couple of percent higher, since cargo tarpaulin often shows some trivial lines as uncovered, while they are definitely covered (try it out yourself). I found and fixed a lot of bugs during this work and I hope the library now is much more safe to use than it was before.

But it is still a young software, please don’t use it in production yet!

Memory fragmentation

The allocator now also works a lot faster and it can reallocate memory in-place, which makes it a lot easier to resist fragmentation. Not much to say here. I ran some tests internally and it appears that fragmentation for a random dataset is not a big issue (around 5%).

QoL improvements

All collections are now also have .iter() method, which returns an iterator over that collection. For most of them this is a zero-cost very efficient abstraction, but not for BTree-based collections - their iterators are pretty heavy and should only be used for small collections.

SVec now has a lot of additional methods like insert(), remove(), from::<Vec<_>>() and binary_search_by().

There is also an additional macro for “un-declaring” a stable variable - s_remove!(Variable) - it will remove the variable from the registry and return it’s content. If the data that was stored there is droppable (if it is a stable collection), then you have to manually release the memory by calling .stable_drop() method on it:

let vec = s_remove!(MyStableVector);
unsafe { vec.stable_drop() };

Refactors

Once you try it out, you’ll notice that some methods changed their name (instead of .drop() - .stable_drop(); instead of .get_cloned() - .get_copy() and so on). Some methods that were previously safe (.from_ptr() or .drop()) are now marked as unsafe. This is because I found these new names and signatures more descriptive and decided to change them while it is still possible to do so without hurting too many people.

What’s next?

I decided to add this -rc1 postfix in order to focus your attention once again on the fact that this library is not yet ready to be used in production. My goal is to make version 0.4 as good as possible, before actually releasing it. To make it into a suitable replacement for standard collections.

Release candidate #2 will be focused around some (probably AVL) augmented binary Merkle tree, that will allow users of ic-stable-memory to easily certify data they store.

I also have a plan to add an infinite SVec back (probably will be named as SRope) as well as to figure out a way to improve BTree-based collections and the BinaryHeap, since their performance is still seems poor to me. This amount of work looks to me as another release candidate milestone.

After that there maybe more release candidates, with the main goal to stabilize the library as much as possible before telling everyone: “Yea, you can use it in you canister”.

Thanks for reading this far. Go try it out and tell me what you think!

By the way, the code for 0.4.0-rc1 is in develop, not in master.

saikatdas0790 · October 25, 2022, 5:19am

Thank you so much for the amazing updates.

A migration path for canisters using ic-stable-memory v0.2.* would be amazing

saikatdas0790 · October 25, 2022, 5:22am

I am not completely sure how this works. An example would be really helpful.

IF I have a struct Profile with name and then want to add another field age, how would already existing Profile structs co-exist with the new Profile with 2 fields in the same stable collection. If I get the value of an older struct, what would be returned in the age field?

saikatdas0790 · October 25, 2022, 5:23am

I imagine any data structure containing a String type can’t utilize this, right?

In that case, looking at the scope, shouldn’t the dynamically sized data option be the default?

senior.joinu · October 25, 2022, 11:15am

Thanks! Sure, you can expect such a guide in a couple of days.

You just have to declare your structures a little bit differently. Imagine intitially you have something like this:

#[derive(Readale, Writable)]
enum User {
  V1(UserV1)
}

#[derive(Readale, Writable)]
struct UserV1 {
  name: String,
  age: u8,
}

And you store those structs in some kind of stable collection:

type UsersMap = SHashMap<SPrincipal, SBox<User>>;
...
let mut users = s!(UsersMap);
let user_box: SBox<User> = users.get_copy(user_principal).unwrap();

// SBox and SBoxMut implement Deref, so you can simply &
match &user_box {
  User::V1(user) => {
    // do stuff
  }
}

Since you’re storing users indirectly, you can simply declare a new version like this:

#[derive(Readale, Writable)]
enum User {
  V1(UserV1),
  V2(UserV2),
}

...

#[derive(Readale, Writable)]
struct UserV2 {
  name: String,
  age: u8,
  phone: String,
}

And use it in your code like this:

match &user_box {
  User::V1(user) => {
    // do stuff
  },
  User::V2(user) => {
    // do stuff with phone number also
  }
}

This is both safe and pretty easy to do.

It depends. First of all, if your string is limited in size (for example, it can’t be bigger than 128 bytes) than you can simply convert your string to [u8; 128] and store it that way. StableAllocated is implemented by default for all primitive numeric types and for [u8] arrays with length equal to some power of two up to 4096 bytes (0, 1, 2, 4, 8, 16, 32, 64 ... 4096). Once generic_const_expr gets more stable it would be possible to implement it for all [u8] arrays of any size generically.

Second, you still can split your struct into a couple of structs - detach a fixed-size part from the dynamically-sized one. Imagine you have these structs:

struct TransactionHistoryEntryBase {
  from: SPrincipal, // SPrincipal also implements StableAllocated
  to: SPrincipal,
  amount: SNat, // SNat also implements StableAllocated,
}

// there is no derive macro for this yet 
impl StableAllocated for TransactionHistoryEntryBase {
  ...
}

#[derive(Readable, Writable)]
struct TransactionHistoryEntryDetails {
  memo: String,
  refs: Vec<u64>,
}

Together they represent a single business entity TransactionHistoryEntry but you can store it in different locations:

type THEBaseLedger = SVec<TransactionHistoryEntryBase>;
type THEDetailsLedger = SVec<SBox<TransactionHistoryEntryDetails>>;

and you can access them separately. You’ll still be able to update this data to a new version (by modifying the dynamically-sized part as I showed earlier), but at least some part of your data will be optimized.

And the third option is, as you said, to store everything indirectly using SBox-es.

About making an indirect flow to be the default one - yes, maybe you’re right, but I’m not sure yet. What I didn’t like about 0.2.0 is that this indirection of storage was hidden from the user. I find this very misleading - developers should know what their code is doing and how it works internally, preferably without having to read through the documentation. This is what I always liked about Rust itself - by default it always does things in the most optimal way.

In rust there is also a notion of indirection. By default, when you allocate some data and if this data is sized, it will be automatically allocated on stack (directly). And only if you explicitly say to allocate this data on heap (to switch to indirect mode) by putting it into a Box it will do so. My motivation was to replicate this behavior and to make the library more idiomatic this way.

saikatdas0790 · October 26, 2022, 4:08am

Thank you for the explanations.

Makes perfect sense

hpeebles · October 26, 2022, 10:46am

@senior.joinu this looks amazing!

I’m keen to move all of the data in OpenStorage (OpenChat’s storage service) into stable memory.

One option is to use the StableStructures built by Dfinity, but that doesn’t work nicely when your values vary in size massively. I’d need to create buckets of increasing size which I want to avoid due to added complexity + memory wasted.

Our keys are all 256bit hashes and our values are vec’s of bytes of varying sizes.
It looks as if your SHashMap fits our use case perfectly.

Would you recommend using this in production yet?

senior.joinu · October 26, 2022, 11:02am

Hey, great to hear that!

Yea, hashmap would work, but if this collection is infinite, a btreemap should be better in long run.

Unfortunately, not yet. It needs a bit more time for stabilization until I could recommend it for production.

hpeebles · October 26, 2022, 11:04am

I’m guessing you recommend BTreeMap because HashMap has to rehash all of the values every so often as items are inserted? Or is there another reason?

Topic		Replies	Views
Rust: using stable storage as main storage - Example/Pattern Developers	6	975	July 30, 2022
Stable Structures: Removing the Bounded Size Requirement Developers	60	2849	May 16, 2024
Motoko stable memory in 2022 Developers Discussing	25	2924	March 23, 2023
Using stable-structures alongside regular ic_cdk stable memory API? Rust	13	878	January 21, 2024
StableBTreeMap in Canisters Rust	17	1845	August 3, 2023

Ic-stable-memory rust library

v0.2.0

Breaking changes

1. Bye Candid, hello Speedy

2. “Memory low” handler system

Other updates

What’s next

0.4.0-rc1

Cons

Pros

Performance improvements

Vec

Hash map

Hash set

BTree map

BTree set

Binary heap

Direct & indirect storage

Stability

Memory fragmentation

QoL improvements

Refactors

What’s next?

Related topics