Ic-stable-memory rust library

Hi there, are there any updates on your project? :grinning:

Hello there!
Yes, I think I would need another week to release.

Found a way to finally make the library completely sound - so had to waste some time with the implementation.

3 Likes

Please take the time you need, this is a very important building block so it needs to have the best quality that it can get.
And quality takes time.

2 Likes

Hi, I’m here again, are there any updates this week? :grinning:

2 Likes

Hey there!
Sorry for taking so long and not standing with my own words.

Once I’ve implemented this new complete-soundness-feature, it revealed a lot of problems I was previously unaware. Now these problems are successfully solved.

But there are still a couple of things I want to do, before releasing:

  1. New certified data structure received a lot of testing already, but I want to double check on it by running it on the main-net.
  2. I want to update the documentation to reflect latest changes and to provide more in-depth view of the library.

Hope to finish with this soon! Stay tuned.

And thank you all for your support. Knowing that there are people who really want to use something you’ve made is an unbelievable motivation!

5 Likes

It will never be perfect, but now it’s good :slight_smile:

:exclamation: ic-stable-memory library has been promoted to v0.4.1 :exclamation:

This is the first release that I consider safe to use. A lot of work was done to ensure it works properly and the code is sound. I’ve spent a lot of time trying to eliminate any possibility of memory leaks or unexpected panics. In fact, I’m so sure about it, that I’m going to start a new project, which is a dapp that will use this library to store everything in stable memory.

The library has changed a lot, so let me walk you through the update.

1. Enforced Rust’s ownership rules

This is the set of changes, that I come up with very recently and nobody was aware I’m doing it, but it is so cool, that I simply can’t start from anything but this.

If you remember, previously each data structure in ic-stable-memory had a special stable_drop() method, that was used to manually release stable memory, when it no longer needed. In other words, stable collections worked like they were written in C/C++ and not in Rust. This was a serious source of errors for both: library users and me. In fact, the only one team that was using ic-stable-memory previously lost all of their data because of a memory leak.

This is no longer an issue though. I’ve managed to “connect” stable-allocated values with Rust’s borrower, so now they follow the same ownership rules as regular values. Let’s see an example:

{
  let mut vec = SVec::<u64>::new();

  for i in 0..100 {
    vec.push(i).expect("out of memory"); // <- more on this below
  }
} // <- `vec` gets stable-dropped, releasing all the memory automatically

Another example:

thread_local! {
  static STATE: RefCell<Option<SVec<SBox<String>>>> = RefCell::default();
}

#[update]
fn push_string(str: String) {
  STATE.with(|s| {
    let boxed_str = SBox::new(str).expect("out of memory"); 
    
    s
      .borrow_mut()
      .as_mut()
      .unwrap()
      .push(boxed_str)          // <- `boxed_str` will NOT get stable-dropped, 
      .expect("out of memory"); // because it is now owned by STATE
  });
}

I know this may not sound as impressive as I’m trying to convince you it is, but this is actually a game-changer in my opinion.

This new set of rules is enforced on runtime. Yes, you lose a tiny bit of performance, but in exchange you can’t leak stable memory anymore. This is it. The library now provides a guarantee, that while you’re using only those functions which are not unsafe, you can’t leak memory.

Also, have you noticed in the example above, how you could easily swap SVec with simple Vec and it would continue to be a valid program? This is what following the rules also gives for free - the library now looks and feels like something from Rust’s ecosystem. It is now a continuation of the language and not an appendix.

In order to contribute even more into this vibe I also introduced a notion of stable reference. Each stable collection now returns a smart-pointer to the data it owns, instead of a copy of that data.

See this example:

let mut map_of_stuff = SBTreeMap::<u64, u64>::new();

let value: SRefMut<u64> = map_of_stuff.get_mut(&10).unwrap();

*value = 15; // <- this is a valid code

These stable references are lifetime-aware, so it is no longer possible to concurrently mutate the same
data structure twice - the compiler simply won’t let that happen.

Also, I removed stable variables from the library. Now you don’t need to use this ugly s! {} syntax anymore, you can simply use the same techniques you use with std collections, like using thread_local!.

All together these changes make it feel like you’re just using common data structures, common Rust, despite the fact that there is a whole custom memory allocator works under the hood and the data is actually not in the stack or not in the heap, but in some other place that only has read_bytes and write_bytes methods to it.

2. New stable memory allocator

By the way, about the allocator. I rewrote it completely from scratch to make it more flexible and efficient.

The old one was O(n) for allocations and O(1) for deallocations, where n is the amount of free blocks in the free-list. The new one is O(logn) for both: allocations and deallocations. This should help with the performance of SBox-ed values. Also, this new allocator can allocate stable memory blocks up to u64::MAX bytes long, which is important for some optimizations.

3. Ready for horizontal scaling

Two previous changes allowed me to re-introduce programmatic handling of OutOfMemory errors. Now, any method of the API, that potentially can allocate stable memory returns Result, where Err variant means, that your subnet is out of stable memory and you have to do something about it.

This feature was in the library from day 0 (and it is actually one of the main reasons, why I wanted to build it), but was removed, when it became clear, that it is impossible to use in complex projects. This is because, when you catch this error and don’t panic, you have to reset the changes that you already did to your canister manually. This was very hard, mainly because of stable_drop() that you have to call on each thing that you’ve created so far. But now, when values stable-drop themselves automatically, this is no longer an issue.

This means, that you can actually program your canister to automatically scale horizontally, when it occupies all available stable memory!

  STATE.with(|s| {
    let boxed_str = SBox::new(str).expect("out of memory"); 
    
    let res = s
      .borrow_mut()
      .as_mut()
      .unwrap()
      .push(boxed_str);

    if res.is_err() {
      scale_horizontally().await;  // <- like this, boxed_str will stable-drop automatically
    }
  });

4. Stable certified map

ic-stable-memory now also provides a certified stable collection SCertifiedBTreeMap. This is a Merkle tree built on top of a B+ tree. What’s more important is that its proofs are compatible with proofs you get from from Dfinity’s RBTree.

I’ve prepared a demo project that is certified assets canister fork, but using only data structures from ic-stable-memory. You can find it here and play with it yourself. I tried it on main-net and was able to render frontend with valid certificates and stuff.

It supports the following proofs:

  • set (in this case - map) membership proof
  • absence proof
  • range proof

Also it is written in such a way, so map mutation and Merkle-tree recalculation are separated. This allows doing multiple update operations in batch, but recompute the underlying Merkle tree only once, which can greatly increase the performance in some cases.

By the way, here are benchmarking metrics in a real canister using performance counter API comparing to the RBTree:

Insert 5_000 entries

rbtree -> `5627092211`
scertifiedbtreemap -> `9108725043` - x1.6 slower
scertifiedbtreemap (in batches of 10) -> `1354608056` - x4.1 faster

Witness 5_000 entries

rbtree -> `3273570622`
scertifiedbtreemap -> `3541619761` - x1.08 slower

Remove 5_000 entries

rbtree -> `9359364040`
scertifiedbtreemap -> `6693095737` - x1.4 faster
scertifiedbtreemap (in batches of 10) -> `731156025` - x12.8 faster

As we can see, batching is totally killing it.

5. Stable data structures tweaks

A lot of work was done to improve stability and performance of existing data structures. You can find actual performance benchmarks here. Here are some short outlines:

  • Completely reworked SBTreeMap. It is now only x2-x3 times slower that the std’s one.
  • SHashMap now uses zwohash and performs deletions without tombstones, which make it keep the performance in shape and not degrade over time. Also, it is now dangerously close in performance with the std’s HashMap.
  • Added new SLog collection, which is an infinite analog to SVec very well optimized for the most recent entries. Should be very handy for storing logs and histories.
  • Removed SBinaryHeap, since SBTreeMap is now simply way faster than my previous binary heap implementation was and I’m out of capacity at the moment to keep it up with the rest of collections.
  • All keyed collections (maps and sets) are now accepting Borrow-ed keys for searching. For example, if you have a map like this SHashMap<SBox<String>>, u64> you can search it simply by &String and not by &SBox<String>.

6. Goodbye Speedy

Yes, Speedy is no longer used. And you’re not longer forced to use any serialization engine. In fact, the library is now uses two types of serialization. You can find out more in this article.

7. Documentation

Yes, the library is finally covered with documentation. There is a complete API documentation on docs.rs and also there are a bunch of tutorials on some important topics like upgradability or horizontal scaling or performance. You can find all of them in the Github repo.

8. Tests

I never in my life wrote so many tests. There are currently around ~120 tests, many of which are randomized. Some of them are completely fuzzy, randomizing even what kind of action they would do with the data structure next. The whole test suite runs about 20-25 minutes on my machine.

In other words, I did my best, guys. I really want it to be as safe to use as possible. But I also want it to be clear, that I’m the only one who tested it and I’m just a human and bugs may appear anyway. Please let me know, if you’ll find one.

What’s next

As I said earlier, I consider this library as a finished product that can be used as-is. It doesn’t mean, that I won’t add anything new to it or fix bugs - I will. But from now on my work on this library will be purely reactive - if you need something, let me know and together we’ll make it happen. If you have questions or want me to create a tutorial on how to handle a particular situation - let me know and we’ll figure it out together.

In other words, in order to improve this library, I now really need your feedback, guys.

Hope you’re having a great day!
Reach me out here, in this forum thread, or on Github issues.

@domwoe @dymayday @saikatdas0790 @cymqqqq @hpeebles

11 Likes

Hi, I’m very excited to hear the amazing news!!
And I’m sorry to bother you. :grinning:
Today I haven’t read your source code completely, and I see this demo code:
#[init]
fn init() {
stable_memory_init();

STATE.with(|s| {
*s.borrow_mut() = Some(SVec::new());
});
}

#[pre_upgrade]
fn pre_upgrade() {
let state: State = STATE.with(|s| s.borrow_mut().take().unwrap());
let boxed_state = SBox::new(state).expect(“Out of memory”);

store_custom_data(0, boxed_state);

stable_memory_pre_upgrade().expect(“Out of memory”);
}

#[post_upgrade]
fn post_upgrade() {
stable_memory_post_upgrade();

let state = retrieve_custom_data::(0).unwrap().into_inner();
STATE.with(|s| {
*s.borrow_mut() = Some(state);
});
}
I have a question about pre_upgrade, post_upgrade here: do you mean it can support automatically recycling memory now? like motoko, it can automatically recycling memory and no need to manually recycle memory after upgrading canister.

1 Like

Hey there!

I’m not quite sure, to what do you mean by saying

These functions: store_custom_data and retrieve_custom_data change nothing - the data was stored in stable memory all the time. What you do with these functions is you store pointers to this data in a known place in stable memory, so you could find this data after the upgrade is done.

And yes, with ic-stable-memory you no longer need to manually recycle any memory. Everything will get recycled automatically both: heap and stable memory, but only when the value leaves the scope.

The idea is that for an observer any code written with ic-stable-memory now looks and feels like a code with regular data structures which store data on heap. But under the hood all the data is stored on stable memory.

For you there is no difference now, whether you use Vec or SVec, HashMap of SHashMap - the API is the same, ownership rules are the same.

In this particular example, that you’ve mentioned, actually nothing leaves the scope. In pre_upgrade function you take your state (the part of it which lives on stack and not in stable memory) from a static variable and move it to the custom data persistent storage, which now owns this data. Then you call stable_memory_pre_upgrade and persist the allocator itself in a known location also. Nothing gets released.
Then in post_upgrade you restore the allocator by using stable_memory_post_upgrade (under the hood the allocator will put itself into the same thread_local! static variable), then you retrieve your state from custom data persistent storage and put it back into your own thread_local! static variable. Nothing gets realeased, nothing leaves the scope. The data that was in your state never leaves the stable memory and doesn’t move inside it. Only pointers move back and forth.

Hope I made it clear.

1 Like

Thx for your answer :grinning:
I am ready to dive into your code and maybe have some questions to ask you.
And I plan to develop a dapp based on your project in the future weeks.

1 Like

Also, everyone, please notice that old tutorials (from papy.rs) are no longer up-to-date and shouldn’t be relied on.

Tried to remove links from my previous posts, but the forum engine doesn’t let me do it.

I think this is fantastic

2 Likes

The documentation now has two more sections:

  • Quick start section, which is an entry point for newcomers.
  • Under the hood section, which for now only contains diagrams about memory allocation, but in future would be a greater source of inner details.

Hope you’ll find them useful.

7 Likes

Hello!

I’m trying to use a SHashMap with [u8, 136] as a key however I’m hitting all sorts of problems. The main issue now seems to be with Candid serialization/ deserialization… which is a bit beyond my rust skills.

Has anyone got a working example of SHashMap with a [u8, N] type key?

Code below for context:

#[derive( Deserialize, Serialize, CandidType, StableType, Debug, Hash, Eq, PartialEq)]
struct IDKey([u8; 136]);
impl AsFixedSizeBytes for IDKey {
    const SIZE: usize = 136;
    type Buf = [u8; Self::SIZE]; // use Vec<u8> for generics  
    
    fn as_fixed_size_bytes(&self, buf: &mut [u8]) {
        let key_bytes = self.0.as_slice();
        buf[0] =  key_bytes.len() as u8;
        buf[1..(1 + key_bytes.len())].copy_from_slice(key_bytes);
    }
    
    fn from_fixed_size_bytes(buf: &[u8]) -> Self {
        let key_len = buf[0] as usize;
        let key: &[u8] = &buf[1..(1 + key_len)];
        return IDKey(key.try_into().unwrap());
    }
}

#[derive(AsFixedSizeBytes, Deserialize, StableType, Debug)]
pub struct Directory {
    pub id_to_ref: SHashMap<IDKey, u32>,
    pub ref_to_id: SHashMap<u32, IDKey>,
    pub next_ref: u32,
}

I know I could use an SBox but I’m trying to avoid this as the Directory will have a lot of lookups/ writes in my code and speed is key.

For info folks - Managed to get it to compile… I didn’t realise [u8; Size] was considered a generic as the type was known. I swapped [u8, Size] for Vec and the errors went away.

#[derive(CandidType, Deserialize, StableType, Hash, Eq, PartialEq, Clone)]
pub struct IDKey(pub Vec<u8>);
impl AsFixedSizeBytes for IDKey {
    const SIZE: usize = 135;
    type Buf =  Vec<u8>; // use for generics  
    
    fn as_fixed_size_bytes(&self, buf: &mut [u8]) {
        let key_bytes = self.0.as_slice();
        buf[0] =  key_bytes.len() as u8;
        buf[1..(1 + key_bytes.len())].copy_from_slice(key_bytes);
    }
    
    fn from_fixed_size_bytes(buf: &[u8]) -> Self {
        let key_len = buf[0] as usize;
        let key: &[u8] = &buf[1..(1 + key_len)];
        return IDKey(key.try_into().unwrap());
    }
}

#[derive(StableType, AsFixedSizeBytes)]
pub struct Directory {
    pub id_to_ref: SHashMap<IDKey, u32>,
    pub ref_to_id: SHashMap<u32, IDKey>,
    pub next_ref: u32,
}
1 Like

Yea, AsFixedSizeBytes is implemented for any [u8; N] by default. So you could simply use the derive macro:

#[derive(AsFixedSizeBytes, Deserialize, Serialize, CandidType, StableType, Debug, Hash, Eq, PartialEq)]
struct IDKey([u8; 136]);

And keep it with an array, instead of vec.

But it seems like serde does not support const generics yet, so in order to implement CandidType/Deserialize you would have to do something like this:

#[derive(AsFixedSizeBytes, StableType, Debug, Hash, Eq, PartialEq)]
struct IDKey([u8; 136]);

impl CandidType for IDKey {
    fn _ty() -> candid::types::Type {
        Vec::<u8>::_ty()
    }

    fn idl_serialize<S>(&self, serializer: S) -> Result<(), S::Error>
    where
        S: candid::types::Serializer,
    {
        self.0.idl_serialize(serializer)
    }
}

impl<'de> Deserialize<'de> for IDKey {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: serde::Deserializer<'de>,
    {
        let arr = Vec::<u8>::deserialize(deserializer)?.try_into().unwrap();

        Ok(IDKey(arr))
    }
}

The serialized is fine, but the deserialized will lose some of the performance due to an allocation.
But this is better in terms of performance, than to use Vec<u8> as your key, since most of the time it will work exclusively on stack and only allocate when deserialized (when received as an argument to a canister function).

1 Like

Thats awesome! Thank you :grin:

1 Like

Another day another question :rofl:

I’ve got the stable memory working well however I’m interested in how the stable memory works along with runtime state and if it’s possible to persist runtime state over upgrades without overwriting or damaging memory allocated for stable structures.

Code for reference:

#[derive(StableType, AsFixedSizeBytes, Debug, Default)]
pub struct Main {
    pub canister_data: CanisterSettings,
    pub processed_data: u64,
    pub directory_data: Directory,
}

#[derive(CandidType, Default, Clone)]
pub struct RuntimeState{
    pub latest_txs: BlockHolder,
    pub canister_logs: Vec<LogEntry>,
    pub temp_vec_ptx: Vec<ProcessedTX>,
    pub temp_vec_stx: Vec<SmallTX>
}

thread_local! {
    pub static RUNTIME_STATE: RefCell<RuntimeState> = RefCell::default();
    pub static STABLE_STATE: RefCell<Option<Main>> = RefCell::default();
}

pub fn state_init(){
    stable_memory_init();
    // init stable state
    let mut stable_data = Main::default();
    let default_admin = string_to_idkey(&"2vxsx-fae".to_string()).unwrap();
    let default_canister_name = string_to_idkey(&"Name Me Please!".to_string()).unwrap();
    stable_data.canister_data.authorised.push(default_admin).expect("Out of memory");
    stable_data.canister_data.canister_name = default_canister_name;
    STABLE_STATE.with(|state| {
        *state.borrow_mut() = Some(stable_data);
    });
    
    // init runtime state
    let mut runtime_state = RuntimeState::default();
    runtime_state.latest_txs.init();
    RUNTIME_STATE.with(|state| {
        *state.borrow_mut() = runtime_state;
    });
    log("Canister Initialised");
}

pub fn state_pre_upgrade(){
    let state: Main = STABLE_STATE.with(|s| s.borrow_mut().take().unwrap());
    let boxed_state = SBox::new(state).expect("Out of memory");
    store_custom_data(0, boxed_state);
    stable_memory_pre_upgrade().expect("Out of memory");
}

pub fn state_post_upgrade(){
    stable_memory_post_upgrade();
    let state: Main = retrieve_custom_data::<Main>(0).unwrap().into_inner();
    STABLE_STATE.with(|s| {
      *s.borrow_mut() = Some(state);
    });
}

The RUNTIME_STATE structs have a lot of Strings/ Vecs and other dynamically sized stuff which I’d like to keep on the heap rather than in stable memory. Is this possible?

Thanks in advance!

Nathan.

Hey Nathan,

You can store your RUNTIME_STATE the same way you’re storing your STABLE_STATE.
Just put it inside an SBox and use store_custom_data() function with another id.

Since you don’t implement StableType and AsDynSizeBytes for RuntimeState, I assume, you can’t do that for some reason. So, in this situation, the way you can resolve it is to simply encode/decode it manually, using candid encoding functions: encode_one() and decode_one(). Then you can put the resulting byte array into the SBox.

// I didn't check the following code, before writing it here
// if it doesn't work, please let me know

pub fn state_pre_upgrade(){
    let state: Main = STABLE_STATE.with(|s| s.borrow_mut().take().unwrap());
    let boxed_state = SBox::new(state).expect("Out of memory");
    store_custom_data(0, boxed_state);

    // RefCell supports .take() just like Option does
    let rstate = RUNTIME_STATE.take();
    let bytes = encode_one(rstate).expect("Unable to candid encode");
    let boxed_bytes = SBox::new(bytes).expect("Out of memory");
    store_custom_data(1, boxed_bytes);

    stable_memory_pre_upgrade().expect("Out of memory");
}

pub fn state_post_upgrade(){
    stable_memory_post_upgrade();
    let state: Main = retrieve_custom_data::<Main>(0).unwrap().into_inner();
    STABLE_STATE.with(|s| {
      *s.borrow_mut() = Some(state);
    });

    let bytes: Vec<u8> = retrieve_custom_data(1).unwrap().into_inner();
    let rstate: RuntimeState = decode_one(&bytes).expect("Unable to candid decode");
    RUNTIME_STATE.replace(rstate);
}
1 Like

Hey Buddy - Thanks for your help once again!

I’ve had a go with this but still hitting a bit of an issue. Got compile errors for take() and replace() being unstable but got around those by using the following code

pub fn state_pre_upgrade(){
    // Stable Storage
    let state: Main = STABLE_STATE.with(|s| s.borrow_mut().take().unwrap());
    let boxed_state = SBox::new(state).expect("Out of memory");
    store_custom_data(0, boxed_state);

    // Runtime Storage
    let rstate = RUNTIME_STATE.with(|s|{s.borrow_mut().to_owned()});
    let bytes = encode_one(rstate).expect("Unable to candid encode");
    let boxed_bytes = SBox::new(bytes).expect("Out of memory");
    store_custom_data(1, boxed_bytes);

    stable_memory_pre_upgrade().expect("Out of memory");
}

pub fn state_post_upgrade(){
    stable_memory_post_upgrade();
    let state: Main = retrieve_custom_data::<Main>(0).unwrap().into_inner();
    STABLE_STATE.with(|s| {
      *s.borrow_mut() = Some(state);
    });

    // Runtime Storage 
    let bytes: Vec<u8> = retrieve_custom_data::<RuntimeState>(1).unwrap().into_inner();
    let rstate: RuntimeState = decode_one(&bytes).expect("Unable to candid decode");
    RUNTIME_STATE.with(|s| {
        *s.borrow_mut() = rstate;
      });
}

However this then throws an error for .into_inner() for trait bounds not satisfied - AsDynSizeBytes, and StableType.

So I had a bash at deriving both for RuntimeState. AsDynSizeBytes wasn’t a problem as I can use SBoxes to wrap the dynamic stuff, however I hit a bit of a dead end with StableType.

RuntimeState has a VecDeque which I don’t think is supported at all. Removing this from the struct still gives issues with any Vec which contains a struct. I think that StableType only compiles if the type is rust standard type (string etc) or principal type?

It’s not a massive issue if I can’t save RUNTIME_STATE during upgrades in this canister… but would be a nice-to-have.

Interested in your thoughts,

Cheers,

Nathan.