Unable to stop canister

To upgrade our fleet of canisters, we typically stop the canister, perform the upgrade, and then restart it. However, we’ve recently encountered issues with stopping the canister due to timeout errors. Is there a more efficient way to handle this process, or could we increase the timeout to resolve the issue?

//canister upgrade
pub async fn upgrade_canister_util(arg: InstallCodeArgument) -> CallResult<()> {
    let canister_id = arg.canister_id;
    stop_canister(CanisterIdRecord { canister_id }).await?;
    let install_code_result = main::install_code(arg).await;
    start_canister(CanisterIdRecord { canister_id }).await?;
    install_code_result
}
//canister logs
[0. 2025-02-27T07:58:49.922734439Z]: Failed to upgrade canister limgq-myaaa-aaaao-qbgga-cai. Error: Stop canister request timed out
[1. 2025-02-27T07:58:57.844228987Z]: Failed to upgrade canister m4e67-viaaa-aaaal-ad73q-cai. Error: Stop canister request timed out

I think you can, in a way, increase the timeout by retrying the stop_canister request until it succeeds. If the target canister gets a lot of traffic this might be worse than increasing the system’s timeout, but I suggest you try this first.

1 Like

Thank you, @michael-weigelt. However, don’t you think it would make more sense for the asynchronous stop_canister call to return only once the canister has actually stopped? As it stands, the current implementation can be quite confusing

For the protocol, a timeout makes sense. But if you’d like the CDK to retry this for you, then you can suggest it e.g. here.

Yes. I will raise an issue for this.