Weird corruption in ic-stable-structures

Hi, I have something really weird going on with ic-stable-structures. It started happening today when I refactored some code and ended up with two memories on different crates.

I have the “user” canister, and a shared crate called state. The state crate contains :

four RefCells each containing a different memory Id. It’s much easier for these to be in a separate crate so I can use helper methods that are not in the actor.

This is how I use mimic to set up the user canister

The MEMORY_MANAGER is on the state crate. Then what this does is initiate a UserIndex under memory id 20, and a Sharder under memory_id 21.

This is how the Sharder is initialised :

Ok so this is the PART THAT CRASHES …

2025-02-12 15:53:17.817183509 UTC: [Canister fhj5a-wmaaa-aaaaa-qaa2q-cai] INFO: trying to insert key 'fokw4-aeaaa-aaaaa-qaa3a-cai' shard 'CanisterShard { pid: Principal { len: 10, bytes: [128, 0, 0, 0, 0, 16, 0, 54, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] }, users: 0 }'
2025-02-12 15:53:17.817183509 UTC: [Canister fhj5a-wmaaa-aaaaa-qaa2q-cai] Panicked at 'Attempting to allocate an already allocated chunk.', /home/adam/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ic-stable-structures-0.6.7/src/btreemap/allocator.rs:166:9
2025-02-12 15:53:17.817183509 UTC: [Canister b77ix-eeaaa-aaaaa-qaada-cai] Panicked at 'called `Result::unwrap()` on an `Err` value: ResponseError(IcError(CallRejected("IC0503: Error from Canister fhj5a-wmaaa-aaaaa-qaa2q-cai: Canister called `ic0.trap` with message: Panicked at 'Attempting to allocate an already allocated chunk.', /home/adam/.cargo/registry/src/index.crates.io-6f17d22bba15001f/ic-stable-structures-0.6.7/src/btreemap/allocator.rs:166:9\nCanister Backtrace:\nic_cdk::api::trap\nic_cdk::printer::set_panic_hook::{{closure}}\nstd::panicking::rust_panic_with_hook\nstd::panicking::begin_panic_handler::{{closure}}\nstd::sys::backtrace::__rust_end_short_backtrace\nrust_begin_unwind\ncore::panicking::panic_fmt\nic_stable_structures::btreemap::allocator::Allocator<M>::allocate\nic_stable_structures::btreemap::BTreeMap<K,V,M>::allocate_node\nic_stable_structures::btreemap::BTreeMap<K,V,M>::insert\napi::state::shared::sharder::Sharder::register_shard\napi::state::shared::sharder::Sharder::register_shard_api\nstd::thread::local::LocalKey<core::cell::RefCell<T>>::with_borrow_mut\ncanister_user::register::{{closure}}\ncanister_user::__canister_method_init_async::{{closure}}\nic_cdk::futures::waker::wake\nic_cdk::api::call::callback\n.\nConsider gracefully handling failures from this canister or altering the canister to handle exceptions.

Very weird. Ok so what I do now is put in a clear_new() statement at the start of the function.

It works… but now one of the endpoints that should return SubnetIndex returns Sharder instead.

2025-02-12 16:40:44.354824855 UTC: [Canister eefug-cuaaa-aaaaa-qaa4a-cai] Panicked at 'called `Result::unwrap()` on an `Err` value: Deserialize("failed to deserialize: Semantic(None, \"invalid type: map, expected enum\") (Map([(Text(\"pid\"), Bytes([128, 0, 0, 0, 0, 16, 0, 57, 1, 1])), (Text(\"users\"), Integer(Integer(0)))]))")', src/api/src/lib.rs:93:1

I am completely out of ideas. The only thing I can think of is that now the same memory manager is used over two crates it’s corrupting it somehow.

Thanks,
Adam

Hi, could you share a bit more of your code so we can understand better how it interacts? Perhaps a repository link?

E.g., when you say “crate”, do you really mean ‘crate’ or rather ‘module’?

hey so its a big project, I’ll share what I can and please ask me if you need more context/questions answered.


these are the canisters in dfx.json. The mimic crate provides you with a couple of helpful macros to initialise the MEMORY_MANAGER, to allow you to easily reference the stores you added in the mimic schema.

The Dragginz code has four shared stable structures that every canister uses.

  • AppState - so we can turn the app on and off without constantly pinging between canisters. The structure cascades from root down to all child canisters.
  • SubnetIndex - a map of CanisterType to Principal, so we know where each singleton canister lives
  • CanisterState - a canister-specific Cell that contains root_id, parent_id
  • ChildIndex - every time a canister creates a child canister its put within this map, this is what’s used for the cascade process.

Currently these four are in the dragginz/api crate where we store helper methods, stable stores, authentication and macros to generate endpoints.

There are two other types of stable store we use, other than the four ‘core’ stores.

  • Sharder - this allows any canister to manage multiple children and load balance them. This is what the User canister uses to keep track of Game canisters and Instance canisters, where players are assigned.
  • UserIndex - this is User-specific. It’s the map of the User’s principal to the Game canister, and the Player ID.

MEMORY_MANAGER is defined in the api crate, along with the core states

///
/// CORE STATE
/// every canister implements these
///
/// AppState and SubnetIndex live on root, and can be cached on other canisters
/// Every canister has its own CanisterState
///

// global memory ids are hardcoded
const APP_STATE_MEMORY_ID: u8 = 1;
const SUBNET_INDEX_MEMORY_ID: u8 = 2;
const CANISTER_STATE_MEMORY_ID: u8 = 3;
const CHILD_INDEX_MEMORY_ID: u8 = 4;

thread_local! {

    ///
    /// APP_STATE
    ///
    /// Scope     : Application
    /// Structure : Cell
    ///
    /// a Cell that's only really meant for small data structures used for global app state
    ///
    /// defaults to Enabled as then it's possible for non-controllers to call
    /// endpoints in order to initialise
    ///

    pub static APP_STATE: RefCell<AppState> = RefCell::new(AppState::init(
        MEMORY_MANAGER.with_borrow(|this| this.get(MemoryId::new(APP_STATE_MEMORY_ID))),
        AppMode::Enabled,
    ));

    ///
    /// CANISTER_STATE
    ///
    /// Scope     : Canister
    /// Structure : Cell
    ///

    pub static CANISTER_STATE: RefCell<CanisterState> = RefCell::new(CanisterState::init(
        MEMORY_MANAGER.with_borrow(|this| this.get(MemoryId::new(CANISTER_STATE_MEMORY_ID))),
    ));

    ///
    /// CHILD_INDEX
    ///
    /// Scope     : Canister
    /// Structure : BTreeMap
    ///

    pub static CHILD_INDEX: RefCell<ChildIndex> = RefCell::new(ChildIndex::init(
        MEMORY_MANAGER.with_borrow(|this| this.get(MemoryId::new(CHILD_INDEX_MEMORY_ID))),
    ));

    ///
    /// SUBNET_INDEX
    ///
    /// Scope     : Subnet
    /// Structure : BTreeMap
    ///

    pub static SUBNET_INDEX: RefCell<SubnetIndex> = RefCell::new(SubnetIndex::init(
        MEMORY_MANAGER.with_borrow(|this| this.get(MemoryId::new(SUBNET_INDEX_MEMORY_ID))),
    ));

}

state_init!

#[macro_export]
macro_rules! state_init {
    ($name:ident, $state:ty, $memory_id:expr) => {
        thread_local! {
            pub static $name: ::std::cell::RefCell<$state> = ::std::cell::RefCell::new(<$state>::init(
                MEMORY_MANAGER.with_borrow(|this| {
                    this.get(::mimic::ic::structures::memory::MemoryId::new($memory_id))
                }),
            ));
        }
    };
}

I’ve tried without the macro and I get the same error, so I think it’s working as I intended it to.

In the user canister, I initialise mimic first, which initialises the schema, uses a build.rs script to import it at build time.

//
// MIMIC
//

mimic_start!("../../../mimic.toml");
mimic_memory_manager!(MEMORY_MANAGER);

const fn _init() {}

async fn _init_async() {
    register_admin_users().await.unwrap();
}

then I use the state_init() macro to generate the stable structures that are specific to this canister

state_init!(USER_INDEX, UserIndex, 20);
state_init!(GAME_SHARDER, Sharder, 21);
endpoints!("user");

This has been working fine, for the last 18 months. When I moved the USER_INDEX and GAME_SHARDER out of the api crate and into User, that’s when I got the errors.

Here is our register_user method. You can see how USER_INDEX and GAME_SHARDER work here.

// register
// private function to handle all registrations
async fn register(user_pid: Principal) -> Result<User, Error> {
    //
    // find next game canister
    //
    let game_canister_pid =
        if let Some(next_pid) = GAME_SHARDER.with_borrow(|this| this.get_next_canister_pid()) {
            next_pid
        } else {
            // send request to create a new game canister
            let pid = api::interface::request::canister_create_api(CanisterType::Game).await?;

            // insert canister shard into game sharder
            let shard = CanisterShard::new(pid);
            GAME_SHARDER.with_borrow_mut(|this| this.register_shard_api(pid, shard))?;

            pid
        };

    //
    // call init_player on the new game canister
    //
    let player_id = api::interface::ic::call_api::<_, (Result<Ulid, Error>,)>(
        game_canister_pid,
        "init_player",
        (user_pid,),
    )
    .await?
    .0?;

    // update user index
    let user = User::new(user_pid, game_canister_pid, player_id, &[]);
    USER_INDEX.with_borrow_mut(|this| this.register_user_api(user_pid, user.clone()))?;

    Ok(user)
}

it’s the line GAME_SHARDER.with_borrow_mut that does the insert, and then causes the error.

1 Like

2025-02-12 17:25:14.529131208 UTC: [Canister b77ix-eeaaa-aaaaa-qaada-cai] INFO: call: subnet_index_cascade@e7aid-ymaaa-aaaaa-qaa6q-cai
2025-02-12 17:25:14.529131208 UTC: [Canister e7aid-ymaaa-aaaaa-qaa6q-cai] Panicked at 'called `Result::unwrap()` on an `Err` value: Deserialize("failed to deserialize: Semantic(None, \"invalid type: map, expected enum\") (Map([(Text(\"pid\"), Bytes([128, 0, 0, 0, 0, 16, 0, 62, 1, 1])), (Text(\"users\"), Integer(Integer(0)))]))")', src/api/src/lib.rs:93:1

So when we workaround the insert by calling clear(), the Insert works and the User canister successfully creates the game canister.

Then after all canisters are created, we call subnet_index_cascade() …

// subnet_index_cascade
pub async fn subnet_index_cascade() -> Result<(), CascadeError> {
    let subnet_index = SUBNET_INDEX.with_borrow(|this| this.get_data());
    let child_index = CHILD_INDEX.with_borrow(|this| this.get_data());

    // iterate child canisters
    for (id, ty) in child_index {
        log!(Log::Info, "subnet_index_cascade: -> {id} ({ty})",);

        call::<_, (Result<(), CascadeError>,)>(id, "subnet_index_cascade", (subnet_index.clone(),))
            .await?
            .0?;
    }

    Ok(())
}

every call is logged, so here’s the call…

INFO: call: subnet_index_cascade@e7aid-ymaaa-aaaaa-qaa6q-cai

error

[Canister b77ix-eeaaa-aaaaa-qaada-cai] Panicked at 'called `Result::unwrap()` on an `Err` value: IcError(CallRejected("IC0503: Error from Canister e7aid-ymaaa-aaaaa-qaa6q-cai: Canister called `ic0.trap` with message: Panicked at 'called `Result::unwrap()` on an `Err` value: Deserialize(\"failed to deserialize: Semantic(None, \\\"invalid type: map, expected enum\\\") (Map([(Text(\\\"pid\\\"), Bytes([128, 0, 0, 0, 0, 16, 0, 62, 1, 1])), (Text(\\\"users\\\"), Integer(Integer(0)))]))\")', src/api/src/lib.rs:93:1

so why is the result of this call, which should be a SubnetIndex, from memory_id 2 somehow cast into the type of Sharder, memory id 21

1 Like

I think I may have two competing memory managers, gimme a bit

1 Like

Yeah, worked it out.

What I had done is instantiated two memory managers, one in the api crate, and one in the user crate. It wasn’t the memory IDs conflicting, it was just the fact that different memory IDs were being assigned to multiple memory managers.

I just define MEMORY_MANAGER in the state crate, and then use that in the User crate.

This is like a super corner case but if there’s any way you can warn with a decent error message may help people stop making the same mistake.

Thanks,
Adam

2 Likes

Solved through the power of rubberducking :duck:. Thanks for sharing the resolution. Also, your ask for a warning is noted.

1 Like

My wife is glad that she wasn’t the rubber duck for once!

3 Likes