Canister traps after upgrade to ic_cdk and ic-* crates to 18 version

Hi everyone,

After upgrading ic-cdk and all related ic-* crates in the project, the canister started trapping during execution. It was working fine before the upgrade, and I haven’t made any major logic changes.

Here is the error I’m facing with:

[ic-cdk-timers] canister_global_timer: CallRejected(CallRejected { raw_reject_code: 5, reject_message: "IC0502: Error from Canister 7uieb-cx777-77776-qaaaq-cai: Canister trapped: unreachable.\nConsider gracefully handling failures from this canister or altering the canister to handle exceptions. See documentation: https://internetcomputer.org/docs/current/references/execution-errors#trapped" })
test gldt_stake_suite::tests::user_flows::withdraw_flow_test ... FAILED

I checked all the unreachable that might have happened because of mine code and everything seems clear, so mine only guess was that it can be related to this:

Any insights or suggestions would be much appreciated!

Thanks in advance :folded_hands:

Not an answer, just sharing in case it helps. This is the thread with the issues I encountered when migrating to ic_cdk v0.18 crates.

Beyond one or two fixes within ic_cdk itself (are you using v0.18.5?), one of the main issues I ran into was resolved by enforcing all my spawned functions to use spawn_017_compat. The behavior of spawn was changed, which caused some features to trap.

1 Like

I had been using 0.18.3 and also tried 0.18.5. I also tried to replace the spawn with spawn_017_compat, but it gave no results. I suppose the issue might be in the ic-cdk-timers itself

Probably a dumb question, but did you also upgrade the related crate - e.g. to v0.12.2?

I just double-checked my PR. Aside from what I shared above, I didn’t change anything else. One issue was indeed related to spawn in timer, but using compat resolved that. Tagging @AdamS who helped debug and knows best!

1 Like

As far as I can see the v0.12.0 is used

1 Like

A pity, I don’t have any further ideas :cry:. Hopefully, Adam — or another expert —can help!

1 Like

I’ve narrowed down the source of the issue by adding more logs. I found out that the trap is related to new canister creation. Here is just a little more context with logs and code:

                let settings = CanisterSettings {
                    controllers: Some(self.controllers.clone()),
                    compute_allocation: None,
                    wasm_memory_threshold: None,
                    memory_allocation: None,
                    freezing_threshold: None,
                    reserved_cycles_limit: Some(Nat::from(self.reserved_cycles)),
                    log_visibility: Some(LogVisibility::Public),
                    wasm_memory_limit: None,
                };
                trace(&format!(
                    "create_canister: constructed CanisterSettings: {:?}",
                    settings
                ));

                let args = &CreateCanisterArgs {
                    settings: Some(settings.clone()),
                };
                trace("create_canister: calling create_canister with args");

                match create_canister_with_extra_cycles(args, self.initial_cycles as u128).await {
                    Ok(canister) => {
                        canister_id = canister.canister_id;
                        trace(&format!(
                            "create_canister: successfully created new canister: {}",
                            canister_id
                        ));
                    }
                    Err(e) => {
                        trace(&format!(
                            "create_canister: failed to create canister after retries: {:?}",
                            e
                        ));
                        return Err(NewCanisterError::CreateCanisterError(format!("{e:?}")));
                    }
                }

                trace("create_canister: adding new canister to fund manager");
                add_canisters_to_fund_manager(
                    &mut self.fund_manager,
                    self.funding_config.clone(),
                    vec![canister_id],
                );
                trace("create_canister: added canister to fund manager");

And the corresponding logs:

2024-12-13 12:03:00.000000019 UTC: [Canister 7uieb-cx777-77776-qaaaq-cai] create_canister: constructed CanisterSettings: CanisterSettings { controllers: Some([Principal { len: 10, bytes: [255, 255, 255, 255, 255, 208, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] }, Principal { len: 10, bytes: [255, 255, 255, 255, 255, 208, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] }]), compute_allocation: None, memory_allocation: None, freezing_threshold: None, reserved_cycles_limit: Some(Nat(100000000000000000)), log_visibility: Some(Public), wasm_memory_limit: None, wasm_memory_threshold: None }
2024-12-13 12:03:00.000000019 UTC: [Canister 7uieb-cx777-77776-qaaaq-cai] create_canister: calling create_canister with args
2024-12-13 12:03:00.000000019 UTC: [Canister 7uieb-cx777-77776-qaaaq-cai] [ic-cdk-timers] canister_global_timer: CallRejected(CallRejected { raw_reject_code: 5, reject_message: "IC0502: Error from Canister 7uieb-cx777-77776-qaaaq-cai: Canister trapped: unreachable.\nConsider gracefully handling failures from this canister or altering the canister to handle exceptions. See documentation: https://internetcomputer.org/docs/current/references/execution-errors#trapped" })
test gldt_stake_suite::tests::user_flows::withdraw_flow_test ... FAILED

I also had fixed the canister creation function in our canister, because it appeared that there were also deprecated methods used, so I also replaced them with new ones but it gave nothing:
ic_cdk::api::management_canister::main::create_canisteric_cdk::management_canister:: create_canister_with_extra_cycles

@AdamS and @peterparker hope this gives more information. Is it possible that I’m doing something wrong or is the error related to the create_canister method?

I don’t use create_canister_with_extra_cycles, so I can’t really compare and I also do not know about the underlying changes. A feedback from Adam or someone else on the ic_cdk team is needed to answer your issue.

That said, I do use create_canister (usage in Juno’s source code), which continues to work as expected — my test suite passes, and I’ve also tested it manually locally. I didn’t make any changes related to that function when upgrading the crate. Sorry if I can’t provide more specific insights :pensive_face:.

Edit: Your question was already shared internally. I just pinged it again for visibility.

1 Like

Just a bit of additional information, I tried to re-export the create_canister_with_extra_cycles method and trace at which moment it fails, so it appears that the problem is here:

pub async fn create_canister_with_extra_cycles(
    arg: &CreateCanisterArgs,
    extra_cycles: u128,
) -> CallResult<CreateCanisterResult> {
    let complete_arg = CreateCanisterArgsComplete {
        settings: arg.settings.clone(),
        sender_canister_version: Some(canister_version()),
    };

    let cycles = cost_create_canister() + extra_cycles;

    > Ok(
    >     Call::unbounded_wait(Principal::management_canister(), "create_canister")
    >         .with_arg(&complete_arg)
    >         .with_cycles(cycles)
    >         .await?
    >         .candid()?,
    > )
}

All the code is executed well before the

        Call::unbounded_wait(Principal::management_canister(), "create_canister")
            .with_arg(&complete_arg)
            .with_cycles(cycles)
            .await?
            .candid()?,

Since you’ve now re-exported the function, maybe, just for a test, you could try calling ic_cdk::api::management_canister::main::create_canister (the function that I’m using) instead of Call::unbounded_wait?

pub async fn create_canister_with_extra_cycles(
    arg: &CreateCanisterArgs,
    extra_cycles: u128,
) -> CallResult<CreateCanisterResult> {
    let complete_arg = CreateCanisterArgsComplete {
        settings: arg.settings.clone(),
        sender_canister_version: Some(canister_version()),
    };

    let cycles = cost_create_canister() + extra_cycles;

    Ok(ic_cdk::api::management_canister::main::create_canister(
      arg, cycles))

create_canister and the function it calls - call_with_payment128 - are deprecated, but it appears to have a similar implementation to the one in the previous version of the CDK (if I understand the code right). If using it works, that would suggest there’s something off with the new unbounded function. If it doesn’t, it won’t help much but, maybe it’s worth a try anyway?

The error message ‘trapped’ (as opposed to ‘trapped explicitly’) indicates that this is actually the ‘unreachable’ instruction in WASM. The most common case is panic, except the CDK has a panic handler that will convert it into an explicit trap; followed by integer division by zero, but the CDK doesn’t do that and the snippet you posted doesn’t seem to do so either. It is likely, but not certain, that the canister creation is a red herring, and the trap comes from other code that executes after the task suspends at the await.

The IC can give you a backtrace when a trap occurs, if the wasm module contains a name section. If you are on the latest version of dfx, the latest version of the CDK, using the rust canister type, then the name section of the wasm module should still be present. If you’re using a custom canister with an ic-wasm invocation, you’ll need to update to the latest version of ic-wasm, then add the --keep-name-section flag to any optimize and shrink steps. Once you have a name section, you should be able to see which function in particular is causing the trap.

2 Likes

Thanks for the reply!

And thank you very much for advise with canister backtrace, I had no idea it’s possible at all

As a result of adding name section I got such output, which is not very clear:

[15] [TRAP]: unreachable
Canister Backtrace:
core::ops::function::FnOnce::call_once
core::ops::function::FnMut::call_mut
canister_update <ic-cdk internal> timer_executor

It seems to be happening in the archiving job and that’s why I had it persistent in other tests too (but only in logs, it haven’t caused the call panic itself). But it’s also unclear for me why the timer_executor can trap. I’m using such method to create a timer:

ic_cdk_timers::set_timer_interval(interval, func);

If you have any idea on that - I’ll be grateful :folded_hands:

But the backtrace also helped me to find the canister trap which caused the test error itself:

  [18] [TRAP]: unreachable
Canister Backtrace:
gldt_stake::state::icrc3_prepare_transaction
gldt_stake::updates::manage_stake_position::__canister_method_manage_stake_position::{{closure}}::{{closure}}
ic_cdk_executor::poll_all
canister_update manage_stake_position

This one I’ll investigate further