Wow, this some really dope work, might completely change my strategy of doing things currently, thanks alot. Really interested in how things work under the hood
v0.2.0
ic-stable-memory
has been promoted to v0.2.0 with huge update and breaking changes!
Breaking changes
1. Bye Candid, hello Speedy
Speedy
is a serialization library which is now used by ic-stable-memory
, instead of Candid
.
This has some implications: both good and bad.
Bad:
- Developers are now obligated to use both
Candid
andSpeedy
in their canisters, if they want to useic-stable-memory
, which would have a negative impact on module size. - This is a breaking change, so you have to make your stuctures stored in stable memory compatible with
Speedy
. This is pretty easy though - just switchCandidType
andDeserialize
derive macros tospeedy::Readable
andspeedy::Writable
. - Now the
Motoko
version ofic-stable-memory
would be very hard to implement. I didn’t have a plan to do it anyway, but now, withoutCandid
, it would take too much effort to even try.
Good:
-
Speedy
is a lot (up to x10 times in some benchmarks) faster thanserde
and has a better memory footprint (does not storeDIDL
magic string or anything like this) thanCandid
. In other words, it just works better for storage-related operations (butCandid
is still top choice when it comes to networking and interoperability). - IMO
Speedy
is easier to work with. It has the same kind ofderive macro
system, but It is easier to implement these macros by hand, when you really need it for some reason.
2. “Memory low” handler system
Previously, every function of ic-stable-memory
allocating new stable memory returned Result<T, OutOfMemory>
. You could use this return value in order to understand whether the canister’s memory is over or not. This worked bad for two reasons:
- Poor API. Basically, you had to append
.unwrap_or_else(...)
to each allocating method of any stable collection (e.g.svec.push()
) in order to catch thisOutOfMemory
error. The code looked bloated with this garbage, providing awful readability. - On my side, I had to implement stable collections in a transactional fashion, so each method, if it throws a error, would reset to the previous state. For some collections this was really easy to achieve, but for some others (like a btree map) this was a pain.
Now I removed this OutOfMemory
from a user-faced API completely. Instead, the library will always make sure, that you have some stable memory available. Once your canister reaches the memory limit (but it still has some free memory available), a special user-defined handler update function on_low_stable_memory()
will be called. In the body of this function you can do anything you want: spawn a new canister, send a e-mail to the developer and so on. Once the memory is really over - the library will just trap, keeping the state safe from corruption.
As a breaking change, this will require some work from you to support. All you have to do is to move your OutOutMemory
reaction logic to on_low_stable_memory()
and to clean your code from every unwrap()
or expect()
that were following your stable collections functions invocations.
The Github readme file now has a section about this mechanism.
Other updates
-
SBtreeMap
andSBtreeSet
are now available as stable collections. They are pretty slow at the moment, but they work. Total list of supported collections:SVec
,SBinaryHeap
,SHashMap
,SHashSet
,SBTreeMap
andSBTreeSet
. - It is possible to use
ic-stable-memory
with stable Rust channel (nightly is no longer a requirement). - Optimizations. All stable collections (except for BTree-based ones) are now working 20-40 times faster than before (the repo also contains benchmarks now).
- Stability improvements.
What’s next
- Update tutorials in order to reflect these new details (UPD: done).
- Focus on stability and performance. The goal is to achieve a state, where it is economically reasonable to use
ic-stable-memory
instead of the standard approach with horizontal scaling andpre_upgrade/post_upgrade
routine.
Wow improvements look super nice ! I’ll definitely give this lib a try for a new storage project soon !
One question tho : what was the incentive to use speedy over another serializer like Message Pack (which Distrikt, OpenChat and Dcvr already use extensively) for example ? Which gives around the same performance improvements in my findings.
Great to hear that, thanks!
I used benchmarks as a main source of info for this decision making session. For example, this one, says that speedy is much faster than rmp and has a better memory footprint.
I like this benchmark repo, because they compare against Abomonation. Which I find very reasonable, since you just can’t make a faster thing than Abomonation.
But, like the song goes: in the end it doesn’t even matter. Any storage-focused library would work just fine and will get its job done. Thanks for your feedback on that one!
I’ve been playing with Rust stable memory for some sample dapps, so very eager to add this to my list. Thanks!
Definitely trying this.
Thank you for enabling rust stable
channel support.
@senior.joinu I tried it and ran into this bug.
Opened an issue for it here
There’s a minimal reproduction in the issue. Let me know if there’s any other detail I can provide
Hey there!
Fixed. Check the issue for more details.
Hi @senior.joinu I ran into another issue with SPrincipal
. Created a separate issue for it here with a minimal reproduction
Let me know if there’s any further details I can provide
Hello again!
New version of the library with this issue fixed is available.
Thanks!
Thank you. Works fine as far as I’ve tested.
0.4.0-rc1
Hey there! Huge stability, performance & other updates here.
ic-stable-memory 0.4.0 release candidate #1
is now published.
The goal for this version was to improve the library so it would be reasonable to use it instead of standard collections. Previous versions had a lot of problems with performance (some collections was 1.5k times slower than the standard ones). But this time everything is much better.
Cons
First, let’s talk about stuff that some might find upsetting.
-
Collections from
0.4.0-rc1
are not backwards compatible with older ones. This is because they now internally use a custom super-fast (up to 5 times faster than Speedy) serialization engine and because of that they store data differently. If you need a migration guide, please let me know (but I hope that most people simply don’t useic-stable-memory
in production). -
ic-stable-memory
is now a nightly only library, because it strongly relies on a couple of unstable features, namely:generic_const_exprs
andthread_local
. So now, in order to use the library you have to installrust 1.66 nightly toolchain
and also append these lines to the beginning of yourlib.rs
:
#![feature(thread_local)]
#![feature(generic_const_exprs)]
-
ic-stable-memory
no longer takes care of horizontal scaling - I found out that developers themselves should be able to do a much better job doing it for their exact use-case. Somax_allocation_pages
andmax_grow_pages
settings are removed as well ason_low_stable_memory
hook. See this thread for more info on how to enable horizontal scaling for your canister. -
Most collections (except
SBTreeMap
andSBTreeSet
) are no longer “infinite” - they are now limited to2**32
elements total. I’m planning on adding an “infinite”SVec
back in a near future.
Pros
Performance improvements
Vec
Before
"Classic vec push" 1000000 iterations: 463 ms
"Stable vec push" 1000000 iterations: 22606 ms (x49 slower)
"Classic vec pop" 1000000 iterations: 406 ms
"Stable vec pop" 1000000 iterations: 11338 ms (x28 slower)
"Classic vec search" 1000000 iterations: 127 ms
"Stable vec search" 1000000 iterations: 2926 ms (x23 slower)
After
"Classic vec push" 1000000 iterations: 46 ms
"Stable vec push" 1000000 iterations: 212 ms (x4.6 slower)
"Classic vec search" 1000000 iterations: 102 ms
"Stable vec search" 1000000 iterations: 151 ms (x1.4 slower)
"Classic vec pop" 1000000 iterations: 48 ms
"Stable vec pop" 1000000 iterations: 148 ms (x3 slower)
Hash map
Before
"Classic hash map insert" 100000 iterations: 224 ms
"Stable hash map insert" 100000 iterations: 7199 ms (x32 slower)
"Classic hash map remove" 100000 iterations: 123 ms
"Stable hash map remove" 100000 iterations: 3618 ms (x29 slower)
"Classic hash map search" 100000 iterations: 69 ms
"Stable hash map search" 100000 iterations: 2325 ms (x34 slower)
After
"Classic hash map insert" 100000 iterations: 96 ms
"Stable hash map insert" 100000 iterations: 387 ms (x4 slower)
"Classic hash map search" 100000 iterations: 47 ms
"Stable hash map search" 100000 iterations: 113 ms (x2.4 slower)
"Classic hash map remove" 100000 iterations: 60 ms
"Stable hash map remove" 100000 iterations: 99 ms (x1.6 slower)
Hash set
Before
"Classic hash set insert" 100000 iterations: 209 ms
"Stable hash set insert" 100000 iterations: 5977 ms (x28 slower)
"Classic hash set remove" 100000 iterations: 180 ms
"Stable hash set remove" 100000 iterations: 2724 ms (x15 slower)
"Classic hash set search" 100000 iterations: 125 ms
"Stable hash set search" 100000 iterations: 2007 ms (x16 slower)
After
"Classic hash set insert" 100000 iterations: 79 ms
"Stable hash set insert" 100000 iterations: 394 ms (x4.9 slower)
"Classic hash set search" 100000 iterations: 54 ms
"Stable hash set search" 100000 iterations: 97 ms (x1.8 slower)
"Classic hash set remove" 100000 iterations: 56 ms
"Stable hash set remove" 100000 iterations: 99 ms (x1.7 slower)
BTree map
Before
"Classic btree map insert" 10000 iterations: 31 ms
"Stable btree map insert" 10000 iterations: 8981 ms (x298 slower)
"Classic btree map remove" 10000 iterations: 17 ms
"Stable btree map remove" 10000 iterations: 19831 ms (x1166 slower)
"Classic btree map search" 10000 iterations: 15 ms
"Stable btree map search" 10000 iterations: 20710 ms (x1380 slower)
After
"Classic btree map insert" 100000 iterations: 267 ms
"Stable btree map insert" 100000 iterations: 17050 ms (x63 slower)
"Classic btree map search" 100000 iterations: 138 ms
"Stable btree map search" 100000 iterations: 566 ms (x4.1 slower)
"Classic btree map remove" 100000 iterations: 147 ms
"Stable btree map remove" 100000 iterations: 1349 ms (x9.1 slower)
BTree set
Before
"Classic btree set insert" 10000 iterations: 26 ms
"Stable btree set insert" 10000 iterations: 8920 ms (x343 slower)
"Classic btree set remove" 10000 iterations: 13 ms
"Stable btree set remove" 10000 iterations: 19601 ms (x1507 slower)
"Classic btree set search" 10000 iterations: 16 ms
"Stable btree set search" 10000 iterations: 20569 ms (x1285 slower)
After
"Classic btree set insert" 100000 iterations: 312 ms
"Stable btree set insert" 100000 iterations: 1771 ms (x5.6 slower)
"Classic btree set search" 100000 iterations: 170 ms
"Stable btree set search" 100000 iterations: 600 ms (x3.5 slower)
"Classic btree set remove" 100000 iterations: 134 ms
"Stable btree set remove" 100000 iterations: 1317 ms (x9.8 slower)
Binary heap
Before
"Classic binary heap push" 1000000 iterations: 995 ms
"Stable binary heap push" 1000000 iterations: 29578 ms (x29 slower)
"Classic binary heap pop" 1000000 iterations: 4453 ms
"Stable binary heap pop" 1000000 iterations: 27159 ms (x6 slower)
"Classic binary heap peek" 1000000 iterations: 133 ms
"Stable binary heap peek" 1000000 iterations: 3314 ms (x25 slower)
After
"Classic binary heap push" 1000000 iterations: 461 ms
"Stable binary heap push" 1000000 iterations: 11668 ms (x25 slower)
"Classic binary heap peek" 1000000 iterations: 62 ms
"Stable binary heap peek" 1000000 iterations: 144 ms (x2.3 slower)
"Classic binary heap pop" 1000000 iterations: 715 ms
"Stable binary heap pop" 1000000 iterations: 16524 ms (x23 slower)
As you can see, in some cases performance gains are obvious, but for some (like SBinaryHeap
) there is not much changed. This is because some algorithms are sensitive to indirect data and some - don’t. Let me explain what I mean.
Direct & indirect storage
Previously, all collections in ic-stable-memory
were indirect - for each element they were allocating a new portion of stable memory to store it, but inside they were only storing a pointer to that stable memory. This is what allowed these collections to store any kind of dynamically sized data inside and to be able to migrate this data to new version (to add new fields) on-the-fly. But this also was a big problem from the performance perspective, since such a flow requires (de)allocating and (de)serializing each element.
This is why now all stable collections are optionally indirect. Now, there is a special trait StableAllocated
. This trait should only be implemented for fixed-size data (for something that can easily implement Copy
). If your data implements this trait, you can pass it straight to a stable collection like so:
let vec = SVec::<MyData>::new();
and the collection will work in high-performance direct mode, storing data inline, right inside the collection. In this case your data will be serialized with the custom serialization engine, that I’ve mentioned before. This engine serializes everything without leaving the stack, straight to a fixed-size byte array. Which makes it a lot faster than any other serialization engine, since all of them are serialize into a heap-allocated buffer.
But if your data is dynamically-sized (or you have a plan to update this data to a new version one day, adding new fields) than you have to store it indirectly. For that there is a couple of new smart-pointers: SBox
and SBoxMut
:
let vec = SVec::<SBox<MyData>>::new();
In that case, the collection will, as before, allocate a new portion of stable memory somewhere else and internally will only store a pointer to that data. You can wrap any data that implements speedy::Readable
and speedy::Writable
in SBox
or SBoxMut
- speedy
is responsible for the serialization is this case.
This indirection polymorphism feature was the longest thing to implement for me and this is why performance gains might not yet look like a big deal for some of you. Internally collections are changed drastically. This will allow for more better optimizations in the future, so don’t get upset - from now the library will only become faster and faster.
Stability
Test coverage is increased up to 95% and in reality it is a couple of percent higher, since cargo tarpaulin
often shows some trivial lines as uncovered, while they are definitely covered (try it out yourself). I found and fixed a lot of bugs during this work and I hope the library now is much more safe to use than it was before.
But it is still a young software, please don’t use it in production yet!
Memory fragmentation
The allocator now also works a lot faster and it can reallocate memory in-place, which makes it a lot easier to resist fragmentation. Not much to say here. I ran some tests internally and it appears that fragmentation for a random dataset is not a big issue (around 5%).
QoL improvements
All collections are now also have .iter()
method, which returns an iterator over that collection. For most of them this is a zero-cost very efficient abstraction, but not for BTree-based collections - their iterators are pretty heavy and should only be used for small collections.
SVec
now has a lot of additional methods like insert()
, remove()
, from::<Vec<_>>()
and binary_search_by()
.
There is also an additional macro for “un-declaring” a stable variable - s_remove!(Variable)
- it will remove the variable from the registry and return it’s content. If the data that was stored there is droppable (if it is a stable collection), then you have to manually release the memory by calling .stable_drop()
method on it:
let vec = s_remove!(MyStableVector);
unsafe { vec.stable_drop() };
Refactors
Once you try it out, you’ll notice that some methods changed their name (instead of .drop()
- .stable_drop()
; instead of .get_cloned()
- .get_copy()
and so on). Some methods that were previously safe (.from_ptr()
or .drop()
) are now marked as unsafe. This is because I found these new names and signatures more descriptive and decided to change them while it is still possible to do so without hurting too many people.
What’s next?
I decided to add this -rc1
postfix in order to focus your attention once again on the fact that this library is not yet ready to be used in production. My goal is to make version 0.4
as good as possible, before actually releasing it. To make it into a suitable replacement for standard collections.
Release candidate #2
will be focused around some (probably AVL) augmented binary Merkle tree, that will allow users of ic-stable-memory
to easily certify data they store.
I also have a plan to add an infinite SVec
back (probably will be named as SRope
) as well as to figure out a way to improve BTree
-based collections and the BinaryHeap
, since their performance is still seems poor to me. This amount of work looks to me as another release candidate milestone.
After that there maybe more release candidates, with the main goal to stabilize the library as much as possible before telling everyone: “Yea, you can use it in you canister”.
Thanks for reading this far. Go try it out and tell me what you think!
By the way, the code for 0.4.0-rc1
is in develop, not in master.
Thank you so much for the amazing updates.
A migration path for canisters using ic-stable-memory
v0.2.* would be amazing
I am not completely sure how this works. An example would be really helpful.
IF I have a struct Profile with name and then want to add another field age, how would already existing Profile structs co-exist with the new Profile with 2 fields in the same stable collection. If I get
the value of an older struct, what would be returned in the age
field?
I imagine any data structure containing a String
type can’t utilize this, right?
In that case, looking at the scope, shouldn’t the dynamically sized data option be the default?
Thanks! Sure, you can expect such a guide in a couple of days.
You just have to declare your structures a little bit differently. Imagine intitially you have something like this:
#[derive(Readale, Writable)]
enum User {
V1(UserV1)
}
#[derive(Readale, Writable)]
struct UserV1 {
name: String,
age: u8,
}
And you store those structs in some kind of stable collection:
type UsersMap = SHashMap<SPrincipal, SBox<User>>;
...
let mut users = s!(UsersMap);
let user_box: SBox<User> = users.get_copy(user_principal).unwrap();
// SBox and SBoxMut implement Deref, so you can simply &
match &user_box {
User::V1(user) => {
// do stuff
}
}
Since you’re storing users indirectly, you can simply declare a new version like this:
#[derive(Readale, Writable)]
enum User {
V1(UserV1),
V2(UserV2),
}
...
#[derive(Readale, Writable)]
struct UserV2 {
name: String,
age: u8,
phone: String,
}
And use it in your code like this:
match &user_box {
User::V1(user) => {
// do stuff
},
User::V2(user) => {
// do stuff with phone number also
}
}
This is both safe and pretty easy to do.
It depends. First of all, if your string is limited in size (for example, it can’t be bigger than 128 bytes) than you can simply convert your string to [u8; 128]
and store it that way. StableAllocated
is implemented by default for all primitive numeric types and for [u8] arrays with length equal to some power of two up to 4096 bytes (0, 1, 2, 4, 8, 16, 32, 64 ... 4096
). Once generic_const_expr
gets more stable it would be possible to implement it for all [u8] arrays of any size generically.
Second, you still can split your struct into a couple of structs - detach a fixed-size part from the dynamically-sized one. Imagine you have these structs:
struct TransactionHistoryEntryBase {
from: SPrincipal, // SPrincipal also implements StableAllocated
to: SPrincipal,
amount: SNat, // SNat also implements StableAllocated,
}
// there is no derive macro for this yet
impl StableAllocated for TransactionHistoryEntryBase {
...
}
#[derive(Readable, Writable)]
struct TransactionHistoryEntryDetails {
memo: String,
refs: Vec<u64>,
}
Together they represent a single business entity TransactionHistoryEntry
but you can store it in different locations:
type THEBaseLedger = SVec<TransactionHistoryEntryBase>;
type THEDetailsLedger = SVec<SBox<TransactionHistoryEntryDetails>>;
and you can access them separately. You’ll still be able to update this data to a new version (by modifying the dynamically-sized part as I showed earlier), but at least some part of your data will be optimized.
And the third option is, as you said, to store everything indirectly using SBox
-es.
About making an indirect flow to be the default one - yes, maybe you’re right, but I’m not sure yet. What I didn’t like about 0.2.0
is that this indirection of storage was hidden from the user. I find this very misleading - developers should know what their code is doing and how it works internally, preferably without having to read through the documentation. This is what I always liked about Rust itself - by default it always does things in the most optimal way.
In rust there is also a notion of indirection. By default, when you allocate some data and if this data is sized, it will be automatically allocated on stack (directly). And only if you explicitly say to allocate this data on heap (to switch to indirect mode) by putting it into a Box
it will do so. My motivation was to replicate this behavior and to make the library more idiomatic this way.
Thank you for the explanations.
Makes perfect sense
@senior.joinu this looks amazing!
I’m keen to move all of the data in OpenStorage (OpenChat’s storage service) into stable memory.
One option is to use the StableStructures built by Dfinity, but that doesn’t work nicely when your values vary in size massively. I’d need to create buckets of increasing size which I want to avoid due to added complexity + memory wasted.
Our keys are all 256bit hashes and our values are vec’s of bytes of varying sizes.
It looks as if your SHashMap fits our use case perfectly.
Would you recommend using this in production yet?
Hey, great to hear that!
Yea, hashmap would work, but if this collection is infinite, a btreemap should be better in long run.
Unfortunately, not yet. It needs a bit more time for stabilization until I could recommend it for production.
I’m guessing you recommend BTreeMap because HashMap has to rehash all of the values every so often as items are inserted? Or is there another reason?