Canister Backtraces

Hi everyone,
We’re starting to work on support for returning canister backtraces when a canister traps as part of the Beryllium milestone. Hopefully this improves the dev experience by making it easier to debug canister issues. Here are some simple examples of roughly what it will look like:

Example 1:

Consider the following Rust canister:

fn baz(input: u32) -> u32 {
    if input > 10 {
        panic!("uh oh");
    }
    42
}

fn bar_1() -> u32 {
    baz(1)
}

fn bar_2() -> u32 {
    baz(100)
}

fn bar_3() -> u32 {
    baz(10_000)
}

fn foo() -> u32 {
    bar_1() + bar_2() + bar_3()
}

#[ic_cdk::query]
fn get_number() -> String {
    format!("Got number {}", foo())
}

The current error you see when executing the query is:

IC0503: Canister rwlgt-iiaaa-aaaaa-aaaaa-cai trapped explicitly: Panicked at 'uh oh', src/report_backtrace_backend/src/lib.rs:7:9.

So you can see that we hit the panic in baz, but it may not be obvious which of the three calls to baz triggered the panic.

With backtraces enabled the full error would become:

IC0503: Canister rwlgt-iiaaa-aaaaa-aaaaa-cai trapped explicitly: Panicked at 'uh oh', src/report_backtrace_backend/src/lib.rs:7:9.
Backtrace: 
0: ic_cdk::api::trap
1: ic_cdk::printer::set_panic_hook::{{closure}}
2: std::panicking::rust_panic_with_hook
3: std::panicking::begin_panic_handler::{{closure}}
4: std::sys_common::backtrace::__rust_end_short_backtrace
5: rust_begin_unwind
6: core::panicking::panic_fmt
7: report_backtrace_backend::baz
8: report_backtrace_backend::bar_2
9: report_backtrace_backend::foo
10: report_backtrace_backend::get_number_0_::{{closure}}
11: ic_cdk::futures::spawn
12: canister_query get_number  

So now you can see that the panic is coming from the call to baz in bar_2 and why bar_2 was called.

Example 2:

The following canister will attempt to perform an out of bounds access and crash:

use std::ptr::write_volatile;

fn baz(input: u32) -> u32 {
    unsafe { write_volatile(input as *mut u32, 5) };
    42
}

fn bar() -> u32 {
    baz(3_000_000_000)
}

fn foo() -> u32 {
    bar()
}

#[ic_cdk::query]
fn get_number() -> String {
    format!("Got number {}", foo())
}

Currently the error message for this contains no information about where the error occurs:

IC0502: Canister rwlgt-iiaaa-aaaaa-aaaaa-cai trapped: heap out of bounds.

With backtraces we’d be able to see that the oob access occurs in baz:

IC0502: Canister rwlgt-iiaaa-aaaaa-aaaaa-cai trapped: heap out of bounds.
Backtrace: 
0: report_backtrace_backend::baz
1: report_backtrace_backend::bar
2: report_backtrace_backend::foo
3: report_backtrace_backend::get_number_0_::{{closure}}
4: ic_cdk::futures::spawn
5: canister_query get_number

Where to View Backtraces [Edit]

The backtraces will always be dumped to the canister logs. Additionally, the error responses to a message will include the backtrace if the caller has sufficient permissions. One option is to return backtraces only to callers that have permission to read the canister logs (via the log visibility setting).

Timeline

The feature will be available for use with local testing in dfx with the Beryllium milestone and on mainnet afterward. Please follow this thread for announcements on the release in dfx and try it out yourself! Especially if you have specific canister bugs where a backtrace might have helped! And if you currently have a bug where a backtrace could be helpful, please send it to me and I’ll get a backtrace back to you before the feature is released.

Limitations

Note that these backtraces will only contain function names (no source code file names on line numbers). This is because the modifications we make to Wasm modules to measure cycles costs make it trickier to determine actual source code locations and it would also require canisters to include full debug info which would make them much larger. If there’s sufficient interest we could investigate supporting source locations as well though.

Privacy

These backtraces will be included in the error response when a message fails and also dumped to the canister’s logs. In order to not expose canister internals, the backtrace will only be incuded in error responses if the caller is a controller of the canister.

Actionable Error Messages

In addition to the backtraces, we’re trying to make it easier to handle canister errors by including links to error documentation along with the error responses. This documentation will include explanations of the error as well as suggestions on how to fix it. Please let us know if you have suggestions on improvents to those docs!

6 Likes

Nice! Will the stack trace be gracefully included in the canister logging as well?

You mentioned dfx. For those of us who don’t use it, will there be any adjustments required beyond upgrading the local replica?

Please can this be a canister setting rather than be restricted to the controller?

Otherwise this feature is pretty much unusable by SNS projects because calls come from users who are never the controller.

The code of many projects is open source too so those projects aren’t going to worry about leaking the internals.

1 Like

Seconding what Hamish mentioned here, is there a way that canister developers can selectively catch a trap in the canister_on_error Canister Lifecycle Hook and then decide whether or not to write the simple log or log with backtrace to their standard out (or an internal logging library)?

Most of the time a trap doesn’t get triggered by the controller of a canister, so only exposing backtraces if the canister is the controller is a bit limiting.

It would be helpful if the protocol provides this flexibility in accessing back traces through lifecycle hooks or another mechanism.

2 Likes

Yes they will be included in the logs.

What are you using? It’ll be a config option for the replica.

But wouldn’t it still be usable by looking at the canister logs? I’m not sure if want to be adding new settings for the visibility of every new canister feature - maybe it would be cleaner to reuse the log visibility since anyone with access to the logs can get the backtraces anyway. That is, backtraces would be returned only to callers that can view the canister logs.

Great! I’m using the local replica. If it’s well communicated, I suppose I just need to activate it. Glad to hear it will also be parts of the mainnet logs. Thansk for the answers.

Sorry it seems like I didn’t make it clear in the original post that the backtraces will also be visible in canister logs. I’ll edit the original post to explain that.

2 Likes

The flag is already in the master branch here so you could enable it in your tooling now, it just doesn’t do anything yet.

Additionally, you many need to modify the build process for your canisters so that they include the necessary debug info. This means preserving the name custom section which is generated in the original Wasm. Tools like ic-wasm and wasm-opt might remove it, but they have options to keep it: ic-wasm has --keep-name-section and wasm-opt has --debuginfo,-g.

Oh, that’s an important piece of information! Not just for my tooling, but Internet Identity and Oisy Wallet are for example also built in the CI with ic-wasm.

So, do I understand correctly that every time ic-wasm is called, the --keep-name-section parameter needs to be appended?

For example here, I chain four execution of ic-wasm in a row.

Yes, for the optimize, shrink, and metadat subcommands of ic-wasm you’ll need to add that flag on each call (I don’t think it applies for other subcommands). It looks like your example is using the metadata subcommand, so you would need the flag.

1 Like

Gotcha! I’ll also share the info with the team.

@abk will this be available to Motoko canisters?

Doh! I didn’t realise this, but of course that makes perfect sense now.

So everything will work fine for SNSes, they just need to make their canister logs public.

Do you know where the flag --keep-name-section is documented? I do not seem to find any information about it on the ic-wasm repo.

It’s (kind of) documented on the website: https://internetcomputer.org/docs/current/developer-docs/backend/rust/optimizing#optimization-for-binary-size

Yep, it will work for Motoko as well.

1 Like

Good point and thanks for the feedback. I just opened a PR so that the CLI will have better documentation for the flag:

$ ic-wasm --help
Manage metadata in the Wasm module

Usage: ic-wasm <INPUT> metadata [OPTIONS] [NAME]
...
Options:
...
  -k, --keep-name-section        Preserve the `name` section in the generated Wasm. This is needed to display the names of functions, locals, etc. in backtraces or debuggers
2 Likes

Am I safe to assume that if I bump the local replica and ic-starter I have actually nothing to enable - i.e. the flag canister backtrace is automatically enabled, correct?

I did not find any starting parameter in ic-starter to enable it but, it seems used in its code source.

Yes, ic-starter will enable it by default.

1 Like