Set up heartbeat to run a specific function (pulling from a queue) every time it runs
Make sure that the process fails
Turn on heartbeat. Heartbeat will now run forever trying to run the job in the queue, but will forever fail.
Try to stop the canister. Canister won’t stop because heartbeat has callbacks. But all requests from this point forward will fail and say “canister stopping”
At this point, you cannot call the canister at all (because canister is stopping), the canister will never stop (it can’t because heartbeat is running continuously), and heartbeat can’t stop because there is an item in the queue that keeps failing.
This is what I am affectionately calling the canister death spiral.
I know this may not be the best way to implement a queue with heartbeat, but it seems like there should be something I can do to rescue a canister in this state. Current best idea is wait for the canister to run really low on cycles (at the freezing threshold) and then regain control of the canister.
If that doesn’t work, you should be able to downgrade the Motoko compiler to a version before this pull request and recompile your canister using that.
When you upgrade again you won’t get the canister_pre_upgrade error.
If downgrading is problematic you should be able to remove the relevant lines of code and build the Motoko compiler yourself.
I could do that and send you a build if you can tell me which version you’re using.
I’d also need to know some info on your system architecture. I’m on an M1 MacBook Pro so a similar device would be the most convenient. I also have a Linux VM that I can probably use if necessary.
Hey Paul, I’m working with Bob through this issue. The canister is controlled by my default identity on dfx.
Also, I have a M1, but mine is a MacBook Air 2022, running dfx 0.9.3. Any clue on how to re-build the motoko compiler and use it instead of my current compiler?
I should easily be able to do a build of 0.9.3 that removes the trap in the pre upgrade hook based on that. You’d be trusting that I’m not doing anything malicious in a build I send you though.
The pre-upgrade hook gets run on the module that is already running in the canister. only the post-upgrade hook is called on the new module. There is no way to upgrade the canister if the pre-upgrade hook fails. The motoko code that traps is in the pre-upgrade hook. the only way to upgrade that canister (without reinstalling or uninstalling) is to stop all the pending callbacks. If the heartbeat keeps waking up when the canister is stopping that seems like a bug to me.
In any case, I think the following error message should be reformatted to avoid this issue in the future:
Error: The Replica returned an error: code 5, message: "Canister <canister_id> trapped explicitly: canister_pre_upgrade attempted with outstanding message callbacks (try stopping the canister before upgrade)"
Maybe include a clause to not try the stop the canister if the user is using heartbeat in the current state?