Ic-stable-memory rust library

It will never be perfect, but now it’s good :slight_smile:

:exclamation: ic-stable-memory library has been promoted to v0.4.1 :exclamation:

This is the first release that I consider safe to use. A lot of work was done to ensure it works properly and the code is sound. I’ve spent a lot of time trying to eliminate any possibility of memory leaks or unexpected panics. In fact, I’m so sure about it, that I’m going to start a new project, which is a dapp that will use this library to store everything in stable memory.

The library has changed a lot, so let me walk you through the update.

1. Enforced Rust’s ownership rules

This is the set of changes, that I come up with very recently and nobody was aware I’m doing it, but it is so cool, that I simply can’t start from anything but this.

If you remember, previously each data structure in ic-stable-memory had a special stable_drop() method, that was used to manually release stable memory, when it no longer needed. In other words, stable collections worked like they were written in C/C++ and not in Rust. This was a serious source of errors for both: library users and me. In fact, the only one team that was using ic-stable-memory previously lost all of their data because of a memory leak.

This is no longer an issue though. I’ve managed to “connect” stable-allocated values with Rust’s borrower, so now they follow the same ownership rules as regular values. Let’s see an example:

{
  let mut vec = SVec::<u64>::new();

  for i in 0..100 {
    vec.push(i).expect("out of memory"); // <- more on this below
  }
} // <- `vec` gets stable-dropped, releasing all the memory automatically

Another example:

thread_local! {
  static STATE: RefCell<Option<SVec<SBox<String>>>> = RefCell::default();
}

#[update]
fn push_string(str: String) {
  STATE.with(|s| {
    let boxed_str = SBox::new(str).expect("out of memory"); 
    
    s
      .borrow_mut()
      .as_mut()
      .unwrap()
      .push(boxed_str)          // <- `boxed_str` will NOT get stable-dropped, 
      .expect("out of memory"); // because it is now owned by STATE
  });
}

I know this may not sound as impressive as I’m trying to convince you it is, but this is actually a game-changer in my opinion.

This new set of rules is enforced on runtime. Yes, you lose a tiny bit of performance, but in exchange you can’t leak stable memory anymore. This is it. The library now provides a guarantee, that while you’re using only those functions which are not unsafe, you can’t leak memory.

Also, have you noticed in the example above, how you could easily swap SVec with simple Vec and it would continue to be a valid program? This is what following the rules also gives for free - the library now looks and feels like something from Rust’s ecosystem. It is now a continuation of the language and not an appendix.

In order to contribute even more into this vibe I also introduced a notion of stable reference. Each stable collection now returns a smart-pointer to the data it owns, instead of a copy of that data.

See this example:

let mut map_of_stuff = SBTreeMap::<u64, u64>::new();

let value: SRefMut<u64> = map_of_stuff.get_mut(&10).unwrap();

*value = 15; // <- this is a valid code

These stable references are lifetime-aware, so it is no longer possible to concurrently mutate the same
data structure twice - the compiler simply won’t let that happen.

Also, I removed stable variables from the library. Now you don’t need to use this ugly s! {} syntax anymore, you can simply use the same techniques you use with std collections, like using thread_local!.

All together these changes make it feel like you’re just using common data structures, common Rust, despite the fact that there is a whole custom memory allocator works under the hood and the data is actually not in the stack or not in the heap, but in some other place that only has read_bytes and write_bytes methods to it.

2. New stable memory allocator

By the way, about the allocator. I rewrote it completely from scratch to make it more flexible and efficient.

The old one was O(n) for allocations and O(1) for deallocations, where n is the amount of free blocks in the free-list. The new one is O(logn) for both: allocations and deallocations. This should help with the performance of SBox-ed values. Also, this new allocator can allocate stable memory blocks up to u64::MAX bytes long, which is important for some optimizations.

3. Ready for horizontal scaling

Two previous changes allowed me to re-introduce programmatic handling of OutOfMemory errors. Now, any method of the API, that potentially can allocate stable memory returns Result, where Err variant means, that your subnet is out of stable memory and you have to do something about it.

This feature was in the library from day 0 (and it is actually one of the main reasons, why I wanted to build it), but was removed, when it became clear, that it is impossible to use in complex projects. This is because, when you catch this error and don’t panic, you have to reset the changes that you already did to your canister manually. This was very hard, mainly because of stable_drop() that you have to call on each thing that you’ve created so far. But now, when values stable-drop themselves automatically, this is no longer an issue.

This means, that you can actually program your canister to automatically scale horizontally, when it occupies all available stable memory!

  STATE.with(|s| {
    let boxed_str = SBox::new(str).expect("out of memory"); 
    
    let res = s
      .borrow_mut()
      .as_mut()
      .unwrap()
      .push(boxed_str);

    if res.is_err() {
      scale_horizontally().await;  // <- like this, boxed_str will stable-drop automatically
    }
  });

4. Stable certified map

ic-stable-memory now also provides a certified stable collection SCertifiedBTreeMap. This is a Merkle tree built on top of a B+ tree. What’s more important is that its proofs are compatible with proofs you get from from Dfinity’s RBTree.

I’ve prepared a demo project that is certified assets canister fork, but using only data structures from ic-stable-memory. You can find it here and play with it yourself. I tried it on main-net and was able to render frontend with valid certificates and stuff.

It supports the following proofs:

  • set (in this case - map) membership proof
  • absence proof
  • range proof

Also it is written in such a way, so map mutation and Merkle-tree recalculation are separated. This allows doing multiple update operations in batch, but recompute the underlying Merkle tree only once, which can greatly increase the performance in some cases.

By the way, here are benchmarking metrics in a real canister using performance counter API comparing to the RBTree:

Insert 5_000 entries

rbtree -> `5627092211`
scertifiedbtreemap -> `9108725043` - x1.6 slower
scertifiedbtreemap (in batches of 10) -> `1354608056` - x4.1 faster

Witness 5_000 entries

rbtree -> `3273570622`
scertifiedbtreemap -> `3541619761` - x1.08 slower

Remove 5_000 entries

rbtree -> `9359364040`
scertifiedbtreemap -> `6693095737` - x1.4 faster
scertifiedbtreemap (in batches of 10) -> `731156025` - x12.8 faster

As we can see, batching is totally killing it.

5. Stable data structures tweaks

A lot of work was done to improve stability and performance of existing data structures. You can find actual performance benchmarks here. Here are some short outlines:

  • Completely reworked SBTreeMap. It is now only x2-x3 times slower that the std’s one.
  • SHashMap now uses zwohash and performs deletions without tombstones, which make it keep the performance in shape and not degrade over time. Also, it is now dangerously close in performance with the std’s HashMap.
  • Added new SLog collection, which is an infinite analog to SVec very well optimized for the most recent entries. Should be very handy for storing logs and histories.
  • Removed SBinaryHeap, since SBTreeMap is now simply way faster than my previous binary heap implementation was and I’m out of capacity at the moment to keep it up with the rest of collections.
  • All keyed collections (maps and sets) are now accepting Borrow-ed keys for searching. For example, if you have a map like this SHashMap<SBox<String>>, u64> you can search it simply by &String and not by &SBox<String>.

6. Goodbye Speedy

Yes, Speedy is no longer used. And you’re not longer forced to use any serialization engine. In fact, the library is now uses two types of serialization. You can find out more in this article.

7. Documentation

Yes, the library is finally covered with documentation. There is a complete API documentation on docs.rs and also there are a bunch of tutorials on some important topics like upgradability or horizontal scaling or performance. You can find all of them in the Github repo.

8. Tests

I never in my life wrote so many tests. There are currently around ~120 tests, many of which are randomized. Some of them are completely fuzzy, randomizing even what kind of action they would do with the data structure next. The whole test suite runs about 20-25 minutes on my machine.

In other words, I did my best, guys. I really want it to be as safe to use as possible. But I also want it to be clear, that I’m the only one who tested it and I’m just a human and bugs may appear anyway. Please let me know, if you’ll find one.

What’s next

As I said earlier, I consider this library as a finished product that can be used as-is. It doesn’t mean, that I won’t add anything new to it or fix bugs - I will. But from now on my work on this library will be purely reactive - if you need something, let me know and together we’ll make it happen. If you have questions or want me to create a tutorial on how to handle a particular situation - let me know and we’ll figure it out together.

In other words, in order to improve this library, I now really need your feedback, guys.

Hope you’re having a great day!
Reach me out here, in this forum thread, or on Github issues.

@domwoe @dymayday @saikatdas0790 @cymqqqq @hpeebles

11 Likes