0.4.0-rc1
Hey there! Huge stability, performance & other updates here.
ic-stable-memory 0.4.0 release candidate #1
is now published.
The goal for this version was to improve the library so it would be reasonable to use it instead of standard collections. Previous versions had a lot of problems with performance (some collections was 1.5k times slower than the standard ones). But this time everything is much better.
Cons
First, let’s talk about stuff that some might find upsetting.
-
Collections from
0.4.0-rc1
are not backwards compatible with older ones. This is because they now internally use a custom super-fast (up to 5 times faster than Speedy) serialization engine and because of that they store data differently. If you need a migration guide, please let me know (but I hope that most people simply don’t useic-stable-memory
in production). -
ic-stable-memory
is now a nightly only library, because it strongly relies on a couple of unstable features, namely:generic_const_exprs
andthread_local
. So now, in order to use the library you have to installrust 1.66 nightly toolchain
and also append these lines to the beginning of yourlib.rs
:
#![feature(thread_local)]
#![feature(generic_const_exprs)]
-
ic-stable-memory
no longer takes care of horizontal scaling - I found out that developers themselves should be able to do a much better job doing it for their exact use-case. Somax_allocation_pages
andmax_grow_pages
settings are removed as well ason_low_stable_memory
hook. See this thread for more info on how to enable horizontal scaling for your canister. -
Most collections (except
SBTreeMap
andSBTreeSet
) are no longer “infinite” - they are now limited to2**32
elements total. I’m planning on adding an “infinite”SVec
back in a near future.
Pros
Performance improvements
Vec
Before
"Classic vec push" 1000000 iterations: 463 ms
"Stable vec push" 1000000 iterations: 22606 ms (x49 slower)
"Classic vec pop" 1000000 iterations: 406 ms
"Stable vec pop" 1000000 iterations: 11338 ms (x28 slower)
"Classic vec search" 1000000 iterations: 127 ms
"Stable vec search" 1000000 iterations: 2926 ms (x23 slower)
After
"Classic vec push" 1000000 iterations: 46 ms
"Stable vec push" 1000000 iterations: 212 ms (x4.6 slower)
"Classic vec search" 1000000 iterations: 102 ms
"Stable vec search" 1000000 iterations: 151 ms (x1.4 slower)
"Classic vec pop" 1000000 iterations: 48 ms
"Stable vec pop" 1000000 iterations: 148 ms (x3 slower)
Hash map
Before
"Classic hash map insert" 100000 iterations: 224 ms
"Stable hash map insert" 100000 iterations: 7199 ms (x32 slower)
"Classic hash map remove" 100000 iterations: 123 ms
"Stable hash map remove" 100000 iterations: 3618 ms (x29 slower)
"Classic hash map search" 100000 iterations: 69 ms
"Stable hash map search" 100000 iterations: 2325 ms (x34 slower)
After
"Classic hash map insert" 100000 iterations: 96 ms
"Stable hash map insert" 100000 iterations: 387 ms (x4 slower)
"Classic hash map search" 100000 iterations: 47 ms
"Stable hash map search" 100000 iterations: 113 ms (x2.4 slower)
"Classic hash map remove" 100000 iterations: 60 ms
"Stable hash map remove" 100000 iterations: 99 ms (x1.6 slower)
Hash set
Before
"Classic hash set insert" 100000 iterations: 209 ms
"Stable hash set insert" 100000 iterations: 5977 ms (x28 slower)
"Classic hash set remove" 100000 iterations: 180 ms
"Stable hash set remove" 100000 iterations: 2724 ms (x15 slower)
"Classic hash set search" 100000 iterations: 125 ms
"Stable hash set search" 100000 iterations: 2007 ms (x16 slower)
After
"Classic hash set insert" 100000 iterations: 79 ms
"Stable hash set insert" 100000 iterations: 394 ms (x4.9 slower)
"Classic hash set search" 100000 iterations: 54 ms
"Stable hash set search" 100000 iterations: 97 ms (x1.8 slower)
"Classic hash set remove" 100000 iterations: 56 ms
"Stable hash set remove" 100000 iterations: 99 ms (x1.7 slower)
BTree map
Before
"Classic btree map insert" 10000 iterations: 31 ms
"Stable btree map insert" 10000 iterations: 8981 ms (x298 slower)
"Classic btree map remove" 10000 iterations: 17 ms
"Stable btree map remove" 10000 iterations: 19831 ms (x1166 slower)
"Classic btree map search" 10000 iterations: 15 ms
"Stable btree map search" 10000 iterations: 20710 ms (x1380 slower)
After
"Classic btree map insert" 100000 iterations: 267 ms
"Stable btree map insert" 100000 iterations: 17050 ms (x63 slower)
"Classic btree map search" 100000 iterations: 138 ms
"Stable btree map search" 100000 iterations: 566 ms (x4.1 slower)
"Classic btree map remove" 100000 iterations: 147 ms
"Stable btree map remove" 100000 iterations: 1349 ms (x9.1 slower)
BTree set
Before
"Classic btree set insert" 10000 iterations: 26 ms
"Stable btree set insert" 10000 iterations: 8920 ms (x343 slower)
"Classic btree set remove" 10000 iterations: 13 ms
"Stable btree set remove" 10000 iterations: 19601 ms (x1507 slower)
"Classic btree set search" 10000 iterations: 16 ms
"Stable btree set search" 10000 iterations: 20569 ms (x1285 slower)
After
"Classic btree set insert" 100000 iterations: 312 ms
"Stable btree set insert" 100000 iterations: 1771 ms (x5.6 slower)
"Classic btree set search" 100000 iterations: 170 ms
"Stable btree set search" 100000 iterations: 600 ms (x3.5 slower)
"Classic btree set remove" 100000 iterations: 134 ms
"Stable btree set remove" 100000 iterations: 1317 ms (x9.8 slower)
Binary heap
Before
"Classic binary heap push" 1000000 iterations: 995 ms
"Stable binary heap push" 1000000 iterations: 29578 ms (x29 slower)
"Classic binary heap pop" 1000000 iterations: 4453 ms
"Stable binary heap pop" 1000000 iterations: 27159 ms (x6 slower)
"Classic binary heap peek" 1000000 iterations: 133 ms
"Stable binary heap peek" 1000000 iterations: 3314 ms (x25 slower)
After
"Classic binary heap push" 1000000 iterations: 461 ms
"Stable binary heap push" 1000000 iterations: 11668 ms (x25 slower)
"Classic binary heap peek" 1000000 iterations: 62 ms
"Stable binary heap peek" 1000000 iterations: 144 ms (x2.3 slower)
"Classic binary heap pop" 1000000 iterations: 715 ms
"Stable binary heap pop" 1000000 iterations: 16524 ms (x23 slower)
As you can see, in some cases performance gains are obvious, but for some (like SBinaryHeap
) there is not much changed. This is because some algorithms are sensitive to indirect data and some - don’t. Let me explain what I mean.
Direct & indirect storage
Previously, all collections in ic-stable-memory
were indirect - for each element they were allocating a new portion of stable memory to store it, but inside they were only storing a pointer to that stable memory. This is what allowed these collections to store any kind of dynamically sized data inside and to be able to migrate this data to new version (to add new fields) on-the-fly. But this also was a big problem from the performance perspective, since such a flow requires (de)allocating and (de)serializing each element.
This is why now all stable collections are optionally indirect. Now, there is a special trait StableAllocated
. This trait should only be implemented for fixed-size data (for something that can easily implement Copy
). If your data implements this trait, you can pass it straight to a stable collection like so:
let vec = SVec::<MyData>::new();
and the collection will work in high-performance direct mode, storing data inline, right inside the collection. In this case your data will be serialized with the custom serialization engine, that I’ve mentioned before. This engine serializes everything without leaving the stack, straight to a fixed-size byte array. Which makes it a lot faster than any other serialization engine, since all of them are serialize into a heap-allocated buffer.
But if your data is dynamically-sized (or you have a plan to update this data to a new version one day, adding new fields) than you have to store it indirectly. For that there is a couple of new smart-pointers: SBox
and SBoxMut
:
let vec = SVec::<SBox<MyData>>::new();
In that case, the collection will, as before, allocate a new portion of stable memory somewhere else and internally will only store a pointer to that data. You can wrap any data that implements speedy::Readable
and speedy::Writable
in SBox
or SBoxMut
- speedy
is responsible for the serialization is this case.
This indirection polymorphism feature was the longest thing to implement for me and this is why performance gains might not yet look like a big deal for some of you. Internally collections are changed drastically. This will allow for more better optimizations in the future, so don’t get upset - from now the library will only become faster and faster.
Stability
Test coverage is increased up to 95% and in reality it is a couple of percent higher, since cargo tarpaulin
often shows some trivial lines as uncovered, while they are definitely covered (try it out yourself). I found and fixed a lot of bugs during this work and I hope the library now is much more safe to use than it was before.
But it is still a young software, please don’t use it in production yet!
Memory fragmentation
The allocator now also works a lot faster and it can reallocate memory in-place, which makes it a lot easier to resist fragmentation. Not much to say here. I ran some tests internally and it appears that fragmentation for a random dataset is not a big issue (around 5%).
QoL improvements
All collections are now also have .iter()
method, which returns an iterator over that collection. For most of them this is a zero-cost very efficient abstraction, but not for BTree-based collections - their iterators are pretty heavy and should only be used for small collections.
SVec
now has a lot of additional methods like insert()
, remove()
, from::<Vec<_>>()
and binary_search_by()
.
There is also an additional macro for “un-declaring” a stable variable - s_remove!(Variable)
- it will remove the variable from the registry and return it’s content. If the data that was stored there is droppable (if it is a stable collection), then you have to manually release the memory by calling .stable_drop()
method on it:
let vec = s_remove!(MyStableVector);
unsafe { vec.stable_drop() };
Refactors
Once you try it out, you’ll notice that some methods changed their name (instead of .drop()
- .stable_drop()
; instead of .get_cloned()
- .get_copy()
and so on). Some methods that were previously safe (.from_ptr()
or .drop()
) are now marked as unsafe. This is because I found these new names and signatures more descriptive and decided to change them while it is still possible to do so without hurting too many people.
What’s next?
I decided to add this -rc1
postfix in order to focus your attention once again on the fact that this library is not yet ready to be used in production. My goal is to make version 0.4
as good as possible, before actually releasing it. To make it into a suitable replacement for standard collections.
Release candidate #2
will be focused around some (probably AVL) augmented binary Merkle tree, that will allow users of ic-stable-memory
to easily certify data they store.
I also have a plan to add an infinite SVec
back (probably will be named as SRope
) as well as to figure out a way to improve BTree
-based collections and the BinaryHeap
, since their performance is still seems poor to me. This amount of work looks to me as another release candidate milestone.
After that there maybe more release candidates, with the main goal to stabilize the library as much as possible before telling everyone: “Yea, you can use it in you canister”.
Thanks for reading this far. Go try it out and tell me what you think!
By the way, the code for 0.4.0-rc1
is in develop, not in master.